GRADUATE  SCHOOL  OF  BUSINESS  ADMINISTRATION 

AND 

SCHOOL  OF  BUSINESS 


D 


UNIVERSITY  OF  SOUTHERN  CALIFORNIA 

~  v  r' 

distribution  statement  A  |  82  02  01  091 

Approved  for  public  release;  ' 

_ Distribution  Unlimited 


Accession  ?or 

NTIS  GRAItl 
DTIC  TAB 
Unannounced 
Justification- 


‘<y — - _ 

Distribution/ 

Availability  Codes 
Avail  and/or  ~ 
1st  Special 


COfY 

.  mSPbCTEC* 


A  STUDY  OF  FACTORS 

AFFECTING  SOFTWARE  TESTING  PERFORMANCE 


COMPUTER  PROGRAM  RELIABILITY  GROWTH 


Jeffrey  Louis  Bahr 

DEPARTMENT  OF  MANAGEMENT  AND  POLICY  SCIENCES 
UNIVERSITY  OF  SOUTHERN  CALIFORNIA 


June  1980 


This  research  was  supported  in  part  by  the 
Office  of  Naval  Research  under  Contract 
N00014-75-C-0733,  Task  No.  042-323,  Code 
434.  Reproduction  in  whole  or  part  is 
permitted  for  any  purpose  of  the  United 
States  Government. 


DISTRIBUTION  STATEMENT  A 

Approved  lor  public  release! 
Distribution  Unlimited 


DTI 


SELECT“ffc 

FEB  3  1982  ji 


CONTENTS 


Page 


ACKNOWIEDOCNTS .  ii 

LIST  OF  TABLES . v 

LIST  OF  FIGURES  .  vii 


Chapter 

I.  INTRODUCTION .  1 

The  Software  Problem 
Modern  Development  Methods 
The  Role  of  Validation 
The  Role  of  Errors  in  Software  Research 
A  Comparison  of  Software  Reliability  Theory 
and  Modern  Testing  Theory 
An  Overview  of  this  Research 

II.  THEORY  TESTING  AND  PRACTICE .  1 6 

Introduction 

Testing  Approaches  and  Results 
Testing  Cost  and  Efficacy 

III.  SOFTWARE  RELIABILITY  THEORY  AND  PRACTICE .  53 

Introduction 

Time-Domain  Models 

Data  Domain  Models 

Estimators  of  Program  Error  Content 

TV.  EXPERIMENTAL  DESIGN  AND  PROCEDURES .  81 

Experimental  Objectives 

Sources  of  Experimental  Variability 

Experimental  Design 

Experimental  Procedures 

V.  DATA  ANALYSIS .  99 

Basic  Results  and  Descriptive  Statistics 
The  Composition  of  Program  Errors 


ill 


Chapter 


Page 


Chi-Square  Analysis  of  Error  Frequencies 

Development  of  Discovery  Times 

Rank  Analysis  of  Ordered  Discovery  Times 

Simple  Correlations 

Multiple  Regression  Analysis 

Analysis  of  Variance 

A  Distributional  Model  of  Discovery  Times 
Test  for  Exponential ity 
Software  Reliability  Models 
Evaluation  of  the  Order  Statistic- 
Based  Model 

VI-  FINDINGS,  CONCLUSIONS,  AND  SUGGESTIONS 


FOR  FUR3BER  RESEARCH .  169 

Findings 

Conclusions 

BIBLIOGRAPHY . . .  192 

APPENDICES .  200 

A.  Questionnaire  Instructions  .  201 

B.  Program  Metrics .  210 

C.  Experimental  Program  Specifications 

and  Listings .  216 

D.  The  Construction  of  Experimental 

Test  Data  Sets .  255 

E.  Experimental  Program  Errors  and 

their  Categorization  .  258 

F.  Contents  of  the  Experimental  Packets .  266 


LIST  OF  TABLES 


Table  Page 

1.  A  Comparison  of  Modern  Testing  and  Software 

Reliability  Disciplines  .  15 

2.  Empirical  Frequencies  of  Major  Error  Categories  .  35 

3-  Empirical  Frequencies  of  Major  Error  Categories  .  36 

4.  Assignment  of  Factor  Settings  to  Subjects .  91 

5.  Breakdown  of  Self-Assessment  Ratings  by  Category  ....  105 

6.  Breakdown  of  Familiarity  Ratings .  107 

7.  Synposis  of  Questionnaire  Variable s .  109 

8.  Errors  Found:  MAX .  Ill 

9.  Errors  Found:  LNPR .  112 

10.  Errors  Found:  OEM .  113 

11.  Errors  Found:  SCNR .  114 

12.  Breakdown  of  Error  Types  Found  by  Program .  117 

13.  Chi-Square  Analysis  of  Error  Discovery  Frequencies  .  .  .  121 

14.  Error  Discovery  Times  in  Minutes  (ITAX) . 122 

15.  Error  Discovery  Times  in  Minutes  (LNPR) .  122 

16.  Error  Discovery  Times  in  Minutes  (OPTM) .  123 

17.  Error  Discovery  Times  in  Minutes  (SCNR) .  124 

18.  Analysis  of  Discovery  Time  Ranks  (ITAX)  .........  128 

19.  Analysis  of  Discovery  Time  Ranks  (LNPR)  . .  129 

20.  Analysis  of  Discovery  Time  Ranks  (OHM) .  130 

21.  Analysis  of  Discovery  Time  Ranks  (SCNR) .  131 

j  22.  Simple  Correlations  of  Questionnaire  Variables 
I  with  Performance  Measures .  135 

v 


hie 


Mnemonics  for  Regression  Variables 


Important  Independent  Variables  in  Regression 
Analysis  by  Program  . 


Important  Independent  Variables  in  Regression 
Analysis  of  Total  Performance  Measures  .  . 


Slims  of  MPEH  for  Group  I 


Sums  of  MPEH  for  Group  II 


Computation  of  Treatment  Comparisons 


ANOVA  Summary  for  Experiment 


Maximum  Likelihood  Estimators  for  ITAX 


Maximum  Likelihood  Estimators  for  LNFR 


Maximum  Likelihood  Estimators  for  OPM 


Maximum  Likelihood  Estimators  for  SCRR 


Operator  Classes  for  BASIC 


Operator  Ratios  for  the  Experimental  Programs 


Complexity  Metrics  for  the  Experimental  Programs 


Error  Description:  OPTM 


E'ror  Description:  LNFR 


Error  Description:  ITAX 


Error  Description:  SCNR 


Breakdown  of  Error  Type  by  Program 


vi 


LIST  OF  FIGURES 


I 

Figure 

1.  A  program  flowchart  . 

2.  A  program  control  graph  . 

3-  Program  flow  chart  . 

k.  Reduced  control  graph  G  . 

5.  Control  graph  G  with  cycles  . 

6.  Error  discovery  in  the  proposed  model  . 

7-  A  pictorial  overview  of  the  experiment  .  .  .  . 

8.  Distribution  of  subjects'  ages  . 

9-  Sample  distributions  of  subjects*  educational 
background  . 

10.  Sample  distributions  of  subjects'  professional 

experience  . 

11.  Sample  distributions  of  environmental  variables 

12.  An  analysis  of  self-assessment  . 

13.  Frequency  distributions  of  errors  by  program  . 

14.  Sample  distributions  of  standardized 

discovery  times  . 

Cl.  Experimental  program  KAX00  . 

C2.  Linear  programming  tableau  . 

C3-  Experimental  program  LHFR00 . 

C4.  Sample  of  fifth  degree  polynomial  . 

C5.  Flowchart  of  the  OHM  program . 

C6.  Experimental  program  OHM00  . . 

C7»  Flowchart  of  the  SC NR  program  . 


CHAPTER  I 


INTRODUCTION 


Since  the  introduction  of  computing  systems  three  decades  ago, 
software  development  has  grown  increasingly  in  importance  and  complex¬ 
ity.  Today  the  expense  incurred  by  users  in  producing  and  maintaining 
programs  exceeds  ten  billion  dollars  (Phister,  1976)  and  constitutes  a 
large  fraction  of  some  corporate  budgets  (Dorn,  1978)-  Joint  revenues 

.  i 

of  independent  software  suppliers  exceeds  one  billion  dollars,  while 
revenues  of  computer  manufacturers  is  approximately  twice  that  (inter¬ 
national  Data  Corporation,  1978).  The  increases  in  60ft ware-related 
expense  far  outstrip  the  growth  in  hardware  costs.  While  the  labor- 
intensive  nature  of  software  production  has  contributed  to  its  increas¬ 
ing  cost,  competitive  pressure  and  increased  manufacturing  automation 
have  reduced  the  cost  of  hardware  elements.  It  is  for  this  and  other 
reasons  that  Boehm  (1973)  predicted  that  software  costs  will  constitute 
90  percent  of  total  system  costs  by  1985. 

As  user  needs  become  satisfied,  software  systems  are  designed 
to  increasingly  demanding  specifications.  This  rapid  growth  in  the 
number  and  complexity  of  desired  systems  demands  equivalent  increases 
in  the  number  of  qualified  software  specialists.  More  importantly, 
it  dictates  a  maturity  in  software  development  methodology.  The  current 
state  of  "software  engineering"  does  not  exhibit  this  degree  of  matur¬ 


ity,  nor  does  any  significant  proportion  of  software  workers  employ 

1 


those  tools  and  techniques  which  most  influence  the  quality  of  software 
systems . 


The  Software  Problem 

Researchers  and  practitioners  in  the  information  sciences  refer 
to  the  "software  problem"  to  indicate  the  special  characteristics  of 
software  as  an  artifact,  and  the  ways  in  which  those  characteristics 
affect  its  production.  Wegner  (1978)  enumerated  the  special  proper¬ 
ties  of  software  that  distinguish  the  process  of  its  manufacture  from 
that  of  other  engineering  projects: 

•  Large  software'  products  generally  support  a  greater 
variety  of  functions  than  conventionally  engineered 
goods. 

.  There  is  an  enormous  variety  of  correct  implementations 
for  a  large  software  system,  from  which  one  must  be 
"selected." 

•  A  large  software  system  must  be  constructed  so  as  to 
mitigate  the  cost  and  difficulty  of  its  inevitable 
modification. 

•  Software  development  milestones  are  difficult  to  estab¬ 
lish,  hence  project  progress  is  hard  to  assess. 

Since  software  partly  resides  in  the  realm  of  ideas,  the  com¬ 
plexity  of  its  implementation  is  seldom  as  obvious  as  that  of  physical 
artifacts.  Moreover,  there  is  some  evidence  that  large  software  sys¬ 
tems  exhibit  properties  substantially  different  frcm  small  software 
systems,  not  Just  differences  in  scale  (Boehm,  McClean,  &  Urfrig, 

1975)-  Because  of  this,  prototyping  of  software  systems  does  not 
generate  as  much  useful  information  as  a  physical  model  which  may  be 


3 


developed  preceding  the  construction  of  physical  goods. 

Society's  appreciation  of  the  software  problem  has  been  con¬ 
stantly  diluted  by  exposure  to  advances  in  the  total  computing  system. 
That  is,  increasing  complexity  in  large  software  systems  has  been 
partially  masked  by  the  apparent  ease  with  which  hardware  components 
have  been  expanded  in  the  last  two  decades.  In  attempting  to  utilize 
increased  capacities  for  information  processing,  a  natural  demand  has 
been  created  for  that  complementary  good  which  controls  the  execution 
of  hardware  elements.  A  large  percentage  of  the  gains  in  hardware, 
however,  have  come  about  with  less  additional  complexity  than  would 
be  necessary  to  increase  software  system  size.  Reductions  in  cost 
and  increases  in  hardware  have  been  substantially  effected  by  methods 
which  are  only  linearly  more  complex  than  older  ones.  These  methods 
include  miniturization  of  components,  and  increases  in  resource  volume 
and  density  without  an  associated  increase  in  interface  complexity. 
Conversely,  advances  and  improvements  in  software  have  cane  about  pri¬ 
marily  through  the  layering  of  capability  after  capability,  which,  at 
each  step,  gently  increases  system  complexity. 

Management  of  software  production  has  been  hindered  by  the 
fact  that  the  software  development  process  generates  inherently  less 
visible  evidence  of  progress  than  other  development  projects.  Walker 
(1978)  attributed  this  to  the  fact  that  the  primary  activity  in  soft¬ 
ware  development  is  that  of  communication,  and  that  without  filtering 
mechanisms  a  substantial  amount  of  noise  is  injected  in  the  messages 
which  become  components  of  a  software  system.  Visibility  of  user 


needs  is  also  often  obscured.  Users  normally  increase  their  expecta¬ 
tion  when  they  are  able  to  state  their  needs  more  explicitly  in  terms 
of  current  system  service.  These  needs  may  not  be  closely  examined; 
because  of  its  intangibility,  however,  software  is  expected  to  be  suf 
ficiently  malleable  to  accommodate  them.  Hence,  maintenance  is  not 
always  subjected  to  the  same  degree  of  feasibility  analysis  as  would 
accompany  a  modification  request  of  a  physical  good. 


Modem  Development  Methods 


It  is  the  current  view  of  researchers  in  software  engineering 
that  aspects  of  the  "software  problem"  can  be  minimized  through  the 
use  of  disciplined  methods  of  specification,  design,  and  coding,  set 
within  a  management  framework  that  permits  the  planning,  measurement, 
and  control  of  development  activities.  The  most  important  factor 
influencing  development  and  maintenance  research  has  been  the  redis¬ 
covery  of  the  life-cycle  metaphor  applicable  to  software  systems 
(Wegner,  1978;  Zeklowitz,  1978).  The  life-cycle  approach  to  software 
development  recognizes  the  various  stages  through  which  the  software 
system  can  be  viewed  as  progressing.  With  each  phase  of  a  software 
system's  life  is  associated  a  set  of  costs  and  benefits  incurred  in 
its  development,  use,  and  maintenance. 

Early  software  development  efforts  focused  on  the  reduction  of 
development  costs  and  duration.  Experience  in  the  last  decade  with 
systems  that  partially  or  completely  failed  to  meet  specifications  and 
budgetary  constraints  forced  an  expansion  of  this  myopic  view.  A 
series  of  regimens  has  been  proposed  that  promote  the  traceability  of 


5 


requirements  through  the  succession  of  representations  of  the  system, 
while  increasing  one's  ability  to  understand  a  software  system's  struc¬ 
ture  and  function.  These  modern  methodologies  include  variants  of 
structured  programming  (Dahl,  Dijkstra,  &  Hoare,  1972),  structured 
design  (Stevens,  Myers,  &  Constantine,  197 h) ,  and  structured  analysis 
(Ross  &  Schoman,  1977)*  Enlightened  project  management  can  exploit 
these  methodologies  while  instituting  measurement  and  validation 
techniques  such  as  peer  review  and  formal  testing,  as  well  as  the 
normal  management  reporting  methods. 

Software  development  can  be  viewed  as  a  process  that  translates 
specifications  into  a  working  system  through  a  series  of  transforma¬ 
tions  that  provide  intermediate  representations.  In  making  opera¬ 
tional  the  system  components  that  satisfy  the  general  set  of  func¬ 
tional,  economic,  and  performance  constraints,  a  succession  of  deci¬ 
sions  has  to  be  budgeted  among  development  team  participants. 

Validation  is  necessary  to  assess  the  effects  of  those  deci¬ 
sions  before  the  system  becomes  too  costly  to  restructure.  Prior  to 
the  development  of  a  system  code  this  validation  normally  takes  the 
form  of  reviews  in  which  users,  developers,  and  project  managers 
cooperate  to  determine  conformance  to  the  project's  plans  and  goals. 

The  most  expensive  and  time-consuming  form  of  validation,  testing, 
currently  and  in  the  past,  has  occurred  after  the  design  has  been 
translated  into  programming  language.  Alberts  (1976)  has  estimated 
that  up  to  50  percent  of  development  cost  is  incurred  during  the  test¬ 
ing  of  software  system  elements,  whereas  up  to  90  percent  of  life-cycle 


6 

costs  involve  maintenance  to  correct  errors  and  rewrite  system  codes  to 


meet  new  requirements. 

The  Role  of  Validation 

Validation  Research  Areas 

Areas  for  research  in  system  validation  may  "be  categorized  in 
a  variety  of  ways,  including:  by  development  phase  addressed,  manual 
methods  vs.  automated  tools,  or  the  degree  to  which  the  results  of 
other  disciplines  are  employed.  Another  useful  dichotomy  is  suggested 
by  the  alternative  views  of  software  as  a  static  document  versus  soft¬ 
ware  as  a  dynamic  system.  Hence  the  validation  of  system  code  can  take • 

two  general  forms:  ! 

•  System  verification,  which  attempts  to  prove  the  correct-  | 

ness  of  the  implementation  by  formal  methods,  and 

•  System  testing,  which  involves  the  application  of  test  J 

data  sets  that  exercise  a  software  system  in  a  manner 
representative  of  future  and  current  use. 

One  important  area  in  testing  research  is  software  reliability  ; 
theory,  in  which  models  are  proposed  that  serve  to  approximate  or  pre¬ 
dict  the  frequency  and  composition  of  faults  that  may  occur  during  sys¬ 
tem  operation.  Because  testing  is  costly,  reliability  theory  gives 
one  a  means  of  maximizing  expected  utility  by  indicating  the  level  of 
testing  necessary  to  trade  off  optimally  the  cost  of  further  quality 
assurance  measures  with  the  cost  of  improper  system  execution.  The 
tradeoffs  differ  from  system  to  system  and  depend,  for  example,  upon 
whether  a  system's  errors  will  result  in  financial  discomfort  or  loss 

jof  human  life.  An  excellent  overview  of  software  reliability  theory 

i 

I  is  given  by  Schick  and  Wolverton  (1978). 


7 


arly  Validation 


"Early  validation"  refers  to  software  development  practices 
that  attempt  to  enhance  the  fidelity  of  translation  from  problem 
statement  to  software  system  design.  Proponents  of  early  validation 
are  justified  in  believing  that  software  system  reliability  must  be 
maintained  through  the  phases  of  system  construction  that  precede  the 
programming  stage. 

Static  analysis  methods  involve  the  examination  of  documents 
representing  various  aspects  of  the  system.  Analysis  of  requirements 
statements,  formal  specifications,  and  design  documents  is  normally 
static  because  these  representations  are  seldom  machine  readable  and 
almost  never  executable. 

The  importance  of  requirements  analysis  has  become  a  major 
theme  in  the  literature  of  software  engineering  for  large  systems 
(Bell  &  Thayer,  1976;  Boss,  1977a).  Requirements  enumerate  the  needs 
of  the  system's  client(s)  by  communicating  a  desire  for  function  within 
seme  perfoimance  context.  Analysis  of  requirements  implies  checking 
for  the  desired  properties  such  as  consistency,  completeness,  lack  of 
ambiguity,  and  feasibility. 

Formal  automated  systems  for  describing  and  analyzing  require¬ 
ment  sets  are  under  development.  Systems  currently  exist  which  auto¬ 
mate  many  of  the  informal  checks  for  properties  desirable  in  specifica¬ 
tions.  The  ISDOS  system  (Teichroew  &  Hershey,  1977)  and  REVS  (Davis  A 
Vick,  1977)>  for  example,  check  to  ensure  the  uniformity  of-  data  defi¬ 
nition  and  use  and  consistency  of  views  of  module  interface,  while 
roviding  reports  facilitating  manual  validation  of  system  descriptions. 


8 


A  variety  of  language  forms  and  media  are  available  for  design  j 
representation  (Ross,  19771 j  Caine  &  Kent,  1975;  Stay,  1976).  Like 
requirements  analyses,  the  validation  of  design  documentation  is 
normally  informal  and  restricted  to  reviev  and  discussion  among  the 
development  team.  Such  reviews  include  the  tracing  of  requirements 
to  design,  analysis  of  control  logic,  critical  comparisons  of  algo¬ 
rithms,  and  other  analyses  to  verify  consistency,  necessity,  suffi-  i 
ciency,  and  correctness.  Automated  analysis  techniques  are  currently 
limited  to  tools  that  check  interface  consistency  or  simulate  struc¬ 
tured  models  of  system  execution  (Ramamoorthy  &  Ho,  1975)* 

Validation  by  Symbolic 
Verification 

Program  cerification  refers  to  the  use  of  mathematical  proof 
techniques  to  validate  program  correctness.  A  program  proof  requires 
a  goal  proposition  that  indicates  the  relationships  that  should  hold 
between  program  variables  at  the  conclusion  of  the  program's  execution. 
This  goal  proposition  is  normally  termed  an  output  assertion;  assump¬ 
tions  regarding  system  states  prior  to  execution  (e.g.,  implied  rela¬ 
tionships  between  subroutine  parameters)  are  reflected  by  input  asser¬ 
tions. 

A  program  is  (totally)  correct  if  it  terminates  for  all  valid 
input,  in  a  state  for  which  the  output  assertion  holds.  A  proof  of 
total  correctness  must  treat  program  statements  as  theorems  that  relate 
statement  preconditions  to  postconditions.  A  case-by-case  analysis  is 
usually  necessary  to  treat  the  proof  subsequences  resulting  from  deci¬ 
sion  points  in  the  program.  These  notions  are  equally  important  in 


/ 


9 


top-down  verification  (constructive  proof  validating  output  assertion 
through  a  series  of  implications  originating  at  the  input  assertion) 
and  bottam-up  verification  (showing  that  the  output  assertion  is  true 
for  an  input  domain,  part  of  which  coni' oms  to  input  assertions) . 

On  the  validation  continuum  between  program  verification  and 
test  case  application,  reside  the  techniques  associated  with  symbolic 
evaluation.  Symbolic  execution  of  a  program  can  be  performed  in  the 
same  way  as  actual  execution:  the  state  of  program  progress  is  re¬ 
corded  by  the  values  of  variables  and  current  instruction.  The  differ¬ 
ence  in  symbolic  and  actual  execution  is  the  form  of  value  assigned  to 
variables  and  the  evaluation  of  conditions  dictating  the  execution 
path.  The  symbolic  evaluation  of  a  path  is  carried  out  by  evaluating 
the  sequence  of,  say,  assignment  statements  occurring  in  the  path. 

Each  assignment  statement  provides  a  new  value  for  the  target  variable 
(left-hand  side)  through  substitution  of  the  current  symbolic  values 
for  each  variable  in  the  source  expression  (right-hand  side). 

The  branching  conditions  that  control  the  selection  of  execu¬ 
tion  paths  (say,  if  an  IF  or  DO-WHILE)  can  be  symbolically  evaluated 
to  form  symbolic  predicates.  The  sequence  of  symbolic  predicates  as¬ 
sociated  with  a  path  can  be  generated  by  assuming  a  truth  value  for 
each  decision  point  during  program  execution.  This  symbolic  system  of 
predicates  for  a  path  can  be  used  to  assist  the  tester  in  constructing 
test  data.  If  the  system  of  predicates  for  a  path  is  unsol vable,  then 
the  path  is  infeasible;  otherwise  the  solution  describes  the  subset  of 
the  input  domain  (uninitialized  variables  and  system  parameters)  that 


invokes  execution  of  that  path. 


Validation  of  System  Testj 


Testing  involves  the  application  of  sets  of  data  in  a  con¬ 
trolled  environment  to  assess  the  reliability  of  a  software  system  and 
identify  errors,  if  any,  in  the  system  representation  that  may  be  cor¬ 
rected.  Testing  goals  usually  include  the  determination  of  whether 
the  system  meets  functional  and  performance  specifications  associated 
with  user  requirements.  Requirements-oriented  goals  are  not  normally 
considered  a  sufficient  set  of  criteria  for  system  testing,  for  they 
are  seldom  sufficiently  detailed  to  indicate  the  test  data  that  prove 
conformance.  Test  criteria  of  a  larger  scope  are  also  necessary  to 
insure  that  the  system  behaves  correctly  under  conditions  not  antici¬ 
pated  by  system  users. 

Researchers  in  testing  theory  have  attempted  to  identify  the 
characteristics  of  testing  criteria  that  could  identify  test  data  whose 
application  could  Insure  system  reliability  to  the  same  degree  pro¬ 
vided  by  formal  proofs  of  correctness  (Goodenough  4  Gerhart,  1975)* 
These  system  validation  criteria  normally  arise  from  one  of  two 
sources: 

»  A  set  of  specifications  that  detail  conditions  for  success 
as  dictated  by  the  requirements  document. 

•  Criteria  composed  by  analyzing  program  structure  to  dis¬ 
cover  conditions  that  lead  to  successful  program  execution. 

Data  generators  exist  to  produce  test  data  subject  to  each  of  these 

two  approaches  (Stucki,  1977)* 

Consider  a  test  data  set,  S,  generated  to  affirm  conformance 
to  specifications,  and  test  data  set  P,  generated  by  analyzing  program 


11 


structure.  A  mechanism  for  exploiting  these  tvo  forms  of  test  data 
generation  may  analyze  the  union  of  the  tvo  data  sets,  S  and  P,  to 
reduce  their  number  by  eliminating  redundant  data  while  providing 
useful  information  about  test  data  inconsistency  and  the  manner  in 
which  each  of  the  two  test  data  sets  was  incomplete. 

A  related  problem  occurs  when  testing  for  regression  errors. 
Regression  errors  are  those  errors  introduced  into  the  system  during 
system  modification.  Brooks  (1972)  provides  evidence  that  a  substan¬ 
tial  percentage  of  maintenance  activities  introduce  errors  that  did 
not  previously  exist.  In  retesting  the  system  after  maintenance,  it 
is  theoretically  sufficient  to  apply  a  limited  set  of  test  data: 
those  test  data  that  reaffirm  the  correctness  of  the  modified  compo¬ 
nents  and  those  system  elements  affected  by  the  modified  elements. 

The  Role  of  Errors  in  Software  Research 

Computer-related  studies  may  be  loosely  dichotomized  into 
those  subdisciplines  dealing  with  function  and  those  concerned  with 
fidelity  and  form.  In  the  former  class  can  be  included  the  application 
areas  (operations  research,  management  information  systems),  the  algo¬ 
rithmic  areas  (numerical  analysis,  time  and  space  efficiency  analysis), 
and  the  special  systems  areas  (operating  systems,  data  base  management). 
Fields  of  study  in  the  second  category  include  those  disciplines  that 
recognize  and  contend  with  the  difficulty  of  producing  automata  that 
conform  to  any  desired  function;  in  short,  they  share  a  view  of  soft¬ 
ware  that  recognizes  the  error  as  a  primitive  component  of  the  develop¬ 
ment  process.  Because  the  difficulties  of  proper  design  and  imple- 

! 


12 


mentation  are  apparently  application- independent,  each  approach  to 
these  difficulties  exhibits  a  characteristic  generality  even  though 
each  differs  widely  in  method.  The  following  taxonomy  is  proposed  to 
describe  these  approaches: 

•  Error  Control:  the  application  of  numerical  analysis  to 
bound  the  inaccuracy  of  finite  machines. 

•  Error  Prevention  Methodologies:  development  procedures 
like  structured  analysis,  design,  and  programming  that 
prescribe  means  with  which  to  facilitate  the  translation 
of  problem  statement  to  coded  solution. 

•  Error  Prevention  Tools:  programming  language  and  support 
.  system  refinements  that  reduce  the  complexity  of  imple¬ 
mentation. 

•  Error  Detection  Methodologies;  systematized  debugging 
procedures  like  desk-checking  and  structured  walkthroughs 
that  increase  the  likelihood  of  finding  misconstructions. 

.  Error  Detection  Tools:  software  tools  like  cross- 
referencers,  static  analyzers,  and  dumps  that  point 
out  syntactic  and  procedural  anomalies. 

•  Fault -Tolerance :  the  study  and  design  of  hardware  or 
software  systems  that  recover  from  failure  with  minimum 
damage. 

•  Proof -of -Correctness :  the  study  and  design  of  methods 
for  constructively  assessing  the  conformance  of  a  proce¬ 
dure  to  its  specifications. 

.  Failure-Inducing  Methods:  test  data  selection  methodologies 
like  path-based  and  specification-driven  approaches  in  which 
sets  of  program  inputs  are  chosen  to  "provoke"  errors  to 
evidence  themselves. 


15 


•  Failure  Behavior  Modeling:  the  branch  of  Software  Relia¬ 
bility  Theory  in  which  the  phenomenon  of  program  failure 
is  viewed  as  stochastic  and  conforming,  in  the  aggregate, 
to  specified  distributions. 

All  of  these  approaches  are  associated  with  fields  of  study  and  vigor¬ 
ously  coexist  as  complementary  solutions  to  problems  of  anticipating, 
recognizing,  explaining,  predicting,  and  correcting  for  program  mal¬ 
function. 


A  Comparison  of  Software  Reliability  Theory 
and  Modern  Testing  Theory 

The  areas  of  research  listed  in  the  previous  section  have,  in 
general,  evolved  from  a  common  >*>se  of  thought.  Each  of  the  disci¬ 
plines  associated  with  these  apT:  e'hes  has  benefited  from  concepts 
and  advances  offered  by  another.  However,  the  areas  of  Modern  Testing 
Theory  (Failure -Induced  Method  4  )  and  Software  Reliability  Theory 
(Failure  Behavior  Modeling)  appear  as  disjoint  a  pair  as  any  two 
research  areas  in  this  collection. 

It  can  be  surmised  that  the  differences  evident  between  these 
approaches  have  their  roots  in  the  training  and  motivations  of  the 
relatively  disparate  research  groups  that  publish  in  each  area. 

Modern  testing  theorists  and  practitioners  have  invariably  received 
training  in  the  computer  sciences,  view  programs  as  manifestations 
with  mathematical  structure,  and  consider  the  most  effective  and 
efficient  means  of  "flushing  out"  what  errors  remain.  By  comparison, 
the  majority  of  Software  Reliability  Theorists  have  a  background  in 
statistics,  view  programs  as  stochastic  event  generators,  and  concern 


v  N 


Table  1 


A  Comparison  of  Modem  Testing  and 
Software  Reliability  Disciplines 


Principal 

activity 


View  of 


Modern  Testing 
Theory 


Finding  errors 


Deterministic 


Software  Reliability 


Estimating 
error  content 


Probabilistic 


Eventual 

aim 


Published 

successes 


Typical  disciplines 
of  investigators 


Typical  publication 
topics 


Assurance  of 
error  elimination 


In  relatively  small 
software  systems 


Computer  science 


Theory  and  practice 
of  test  data 
selection 


Proper  reliability 
assessment 


In  relatively  large 
software  systems 


Statistics 


Models  of  software 
failure  behavior 


Principal  publica¬ 
tion  vehicle 


Software  Engineering  Literature 


CHAPTER  II 


THEORY  TESTING  AND  PRACTICE 

Introduction 

A  program  P  may,  in  theory,  be  proved  correct  in  one  of  tvo 
ways:  by  formal  verification  or  exhaustive  testing.  P  effectively 
computes  a  function  f  by  correctly  mapping  elements  of  f's  input 
danain  D  into  the  proper  element  of  f's  output  range .  Formal  verifi¬ 
cation  normally  involves  a  case  analysis  of  the  program  P  to  prove 
symbolically  that  the  collection  of  partial  functions  composing  f  is 
faithfully  represented  in  the  structure  of  the  program.  Altnough 
research  proceeds  vigorously  on  program  verification,  there  is  little 
hope  that  large  or  complex  programs  will  admit  of  proof  in  the  near 
future  (Huang,  1975) • 

Exhaustive  testing  is  also  out  of  the  question  for  all  but 
trivial  programs,  because  even  one  32-bit  integer  Input  requires  the 
application  of  2s2  test  cases  (and  inspection  of  results)  to  prove 
program  correctness.  One  obvious  simplification  in  the  test  procedure 
follows  from  the  consideration  of  program  structure. 

Program  execution  is  a  matter  of  successive  instructional  in¬ 
terpretation  (as  mirrored  by  a  path  through  a  program  flowchart).  A 
substantial  reduction  in  the  size  of  the  test  data  set  is  expected  by 
applying  one  test  data  set  for  each  unique  flow  through  the  program. 

16 


17 

Figure  1  shows  a  sample  flowchart  associated  with  a  program  P,  com¬ 
posed  of  functional  instructions  •••,S6  and  decision  points  Dj_,  Dg, 
and  D3.  The  input  domain  I  of  this  program  may  be  partitioned  into 
^subsets  Ii,Ie, " • *,In  vhere  every  input  data  point  in  Ij  leads  to  the 
same  execution  sequence  of  the  functional  instructions  Six, S-^g, • • • ; 
that  is,  I  is  partitioned  into  subdomains,  where  each  member  of  a  sub- 
domain  leads  to  the  same  path  through  the  flowchart. 

A  more  compact  form  of  representation  is  useful  for  demonstrat¬ 
ing  path  analysis.  The  control  graph  (Figure  2)  represents  a  program's 
uninterrupted  sequence  of  functional  statements  (termed  segments)  as 
nodes,  and  alternatives  for  execution  flow  as  arcs.  The  control  graph 
arcs  can  be  labeled  by  the  conditions  that  must  be  satisfied  for  an 
arc  to  be  selected,  as  the  c^  and  represent  the  true-valued  and 
false-valued  outcomes  of  the  decision  corresponding  to  Ih  in  the  flow¬ 
chart. 

Testing  involves  the  selection  of  a  subset  T  of  the  input 
domain,  the  application  of  each  data  point  t  in  T  to  the  program  and 
the  determination  that  the  program  output  F(t)  is  an  acceptable  result, 
i.e.,  that  P(t)  is  an  effective  computation  of  f(t).  In  a  seminal 
paper,  Goodenough  and  Gerhart  (1975)  attempted  to  define  the  conditions 
under  which  testing  is  the  practical  equivalent  of  a  formal  correctness 
proof.  Hovden  (1976)  elucidated  these  results  in  the  following  way: 
Consider  a  test  strategy  H,  associated  with  a  test  criterion  C_.  H  is 
la  procedure  for  selecting  a  subset  T  of  a  program's  input  domain  so 

i 

(that  the  t  e  T  individually  or  collectively  satisfy  the  test  criterion 


20 


I*  is  a  reliable  test  set  if  the  successful  execution  of  every 
t  €  T*  implies  the  correctness  of  P.  Because  no  strategy  exists  that 
can  generate  T*  for  an  arbitrary  program  (this  is  undecidable),  work 
proceeds  on  identifying  particular  classes  of  programs  and  program 
errors  for  which  some  test  criterion  is  reliable  (Howden,  1976; 

Gerhart  &  Yelowitz,  1976)-  Howden  (197 6)  identified  two  general 
classes  of  program  error  by  indicating  the  reliability  of  path-testing 
in  discovery  of  each  error  class.  A  path-testing  strategy  is  a  proce¬ 
dure  for  selecting  one  data  point  from  each  subset  D.  of  the  input 
domain,  that  is,  the  application  of  one  test  case  for  each  unique  path 
through  the  program.  Such  a  strategy  is  reliable  for  discovering 

•  computation  errors,  in  which  a  subject  program  P  has  an 
identical  structure  to  the  correct  program  P*  (same  paths, 
same  partitions  D^)  but  always  yields  a  differing  result 
for  seme  path; 

•  domain  and  subcase  errors,  in  which  both  a  correct  program 
P  have  the  same  collection  of  paths,  but  a  differing  par¬ 
titioning  of  the  input  domain  associated  with  the  paths, 
and  a  differing  selection  of  paths,  which  yields  differ¬ 
ing  results  (Howden,  1976). 

The  results  from  the  theory  of  testing  have  little  practical  applica¬ 
bility,  but  do  provide  bounds  on  the  kinds  of  errors  that  can  be  dis¬ 
covered  (at  all)  by  testing  criteria  less  comprehensive  than  path 
testing.  Like  the  problem  classes  proposed  by  computability  theorists, 
modern  testing  theorists  have  attempted  to  prove  the  inherent  diffi¬ 
culty  of  various  kinds  of  errors. 


21 


Testing  Approaches  and  Results 

Structural  Testing 

The  identification  of  paths  is  an  integral  part  of  every  formal 
testing  procedure.  Prior  to  the  realization  that  formal  testing  was 
worth  its  cost,  programmers  constructed  test  data  sets  in  hopes  of 
discovering  errors,  thus  ensuring  proper  program  execution  for  those 
data  deemed  representative  of  real  system  use.  The  obvious  opportunity 
for  misjudgment  and  resulting  proliferation  of  errors  "in  the  field" 
has  led  many  organizations  to  formalize  testing  by  creating  test  data 
that  might  affirm  conformance  to  the  explicit  functional  and  perform¬ 
ance  requirements  of  the  system  under  test.  In  a  study  of  six  large 
programs,  Howden  (1978a)  found  that  functional  testing  uncovered  more 
errors  than  test  criteria  based  solely  upon  program  structure. 

Functional  testing  is  effective  but  can  fail  to  detect  many 
errors  that  may  be  detected  by  more  methodological  approaches 
(Ramamoorthy  &  Ho,  1975)-  Structural  testing  provides  a  complementary 
method  of  program  validation  by  approaching  the  problem  by  analysis  of 
the  program.  Path  testing  is  often  impossible  because  the  number  of 
paths  through  a  program  can  be  very  large.  In  fact,  if  a  cycle  with 
no  bound  on  iteration  exists  within  a  program,  the  number  of  paths  may 
be  unbounded.  There  are  three  common  test  criteria  that  attempt  to 
exercise  relevant  aspects  of  a  program  without  resorting  to  full  path 
testing:  segment  testing,  branch  testing,  and  selective  path  testing. 
Each  testing  approach  attempts  to  identify  a  set  of  program  paths  whose 
successful  execution  will  increase  the  confidence  in  program  correct- 

f 

jness. 


22 


Segment  testing  involves  the  construction  of  test  cases  to 
ensure  that  every  statement  in  the  program  is  executed  at  least  once. 

If  a  program  statement  is  seriously  misspecified,  segment  testing  may 

I 

identify  a  program  error  irrespective  of  the  execution  path  in  which 
the  erroneous  statement  is  included.  Segment  testing  derives  Its  name  j 
from  the  concept  of  a  se&cent:  a  series  of  program  instructions  with  1 
a  consecutive  execution  sequence.  A  reduced  control  graph  for  a  pro¬ 
gram  normally  contains  vertices  associated  with  program  segments.  The 
first  statement  in  a  segnent  is  the  object  of  one  or  more  branching 
instructions  (designated  by  arcs).  The  last  statement  either  leads 
directly  to  a  decisional,  statement  or  in  a  decisional  statement  (de¬ 
pending  upon  the  desire  to  implicitly  or  explicitly  represent  branching 
statements  in  the  control  graph) . 

Figures  3  and  4  show  an  abstract  program  and  control  graph  in 
which  vertices  have  been  employed  to  represent  segments  of  sequential 
instructions.  The  paths  (a,b,d,e,f, j)  and  (a,b,g,i,f, j)  in  graph  G 
provide  a  covering  of  the  vertices  of  the  control  graph  G  and  so 
ensure  the  execution  of  every  segment.  An  alternative  testing  strategy 
involves  the  application  of  test  data  to  exercise  every  decision  branch 
in  a  program,  hence  every  edge  in  the  control  graph.  The  minimal 
number  of  test  cases  required  to  ensure  edge  covering  serves  as  an 
upper  bound  on  the  number  for  vertex  covering,  for  if  every  interseg¬ 
ment  branch  is  taken,  every  segment  must  be  invoked.  The  path  set 
{(a,b,d,e,f,j),(a,b,g,e,f,j),(a,b,g,i,f,j),(a,b,d,e,f,b,g,i,f,j))  is 
one  edge  covering  for  G. 


SEGMENTS 


S 


Reported  results  with  segnent  and  branch  testing  effectiveness 
have  varied,  depending  upon  the  sophistication  of  testing  procedures 
that  were  replaced  by  these  formal  methods.  In  one  aerospace  applica¬ 
tion,  branch  testing  purportedly  eliminated  90  percent  of  program 
faults  (Brown,  1975)*  In  another  study  of  development  costs  for  a 
command  and  control  system,  Alberts  (1976)  concluded  that  automated 
tools  employing  branch  testing  caught  60-100  percent  of  all  program 
errors  two  to  five  months  earlier  than  they  would  otherwise  have  been 
detected.  Very  few  results  have  been  published  indicating  the  effec¬ 
tiveness  of  vertex  covering.  Fisher  (1977)  employed  segment  testing 
(vertex-covering)  as  a  retest  criterion  to  select  test  cases  to  reapply 
after  program  modification.  In  a  small  study  comparing  a  vertex¬ 
covering  test  data  set  to  an  expanded  edge-covering  set.  Brown  and 
Lipow  (1975)  found  the  latter  much  more  representative  of  a  variety 
of  assumptions  regarding  Input  data  distributions.  Segment  testing 
is,  however,  nearly  always  superior  to  intuition.  For  example,  one 
study  of  typical  testing  practice  in  a  large  software  organization 
revealed  that  on  the  average  only  one -third  of  all  program  statements 
were  being  exercised  by  existing  test  data  sets  (Stucki,  1978). 


Bounds  on  the  Test  Set  Cardinalit 


It  would  seem  that  in  some  situations  the  tradeoffs  between  the 
ost  of  testing  and  desired  program  reliability  would  lead  one  to  opt 
or  a  vertex  covering  approach.  One  factor  in  the  choice  of  edge  over 
vertex  covering  test  criteria  is  the  additional  resource  required  by 
Ithe  latter,  depending  upon  the  relative  sizes  of  their  minimal  test 


r  — 


26 


sets.  A  bound  on  the  minimal  size  of  a  test  set  that  provides  edge 
covering  vas  inferred  by  McCabe  (1976)  in  an  article  describing  the 
relationships  between  control  graph  structure  and  complexity.  Because 
of  the  importance  of  this  article  in  describing  the  graph  relationships 
| that  hold  for  programs ,  a  synopsis  of  the  results  follows. 

A  strongly  connected  (directed)  graph  is  one  in  which  any 
vertex  can  be  reached  from  any  other  by  same  path  through  the  graph. 

If  an  edge  from  the  exit  vertex  (last  program  statement)  to  the  entry 
vertex  (first  program  statement)  is  added  to  a  control  graph  G  as  in 
Figure  5>  then  G  is  strongly  connected  and  a  cycle  (path  from  a  vertex 
to  itself)  exists  from  every  node.  A  definition  from  graph  theory 
(Harary,  1972)  specified  that  the  cyclomatic  number  of  a  connected 
graph,  V(G)  equals  e  -  n  +  1  where  e  is  the  number  of  edges  and  n  is 
the  number  of  nodes.  A  resulting  theorem  (Berge,  1973)  proved  that 
V(G)  is  equal  to  the  maximum  number  of  linearly  independent  circuits 
in  G.  A  set  of  linearly  independent  circuits  of  a  graph  can  be  com¬ 
bined  to  form  any  path  in  a  graph,  and  contains  no  circuit  that  can  be 
composed  of  any  combination  of  the  others.  For  any  graph,  there  exist 
many  such  6ets  of  linearly  independent  circuits. 

i  As  an  example,  consider  the  aupnented  control  graph  Figure  5 

with  six  vertices  and  nine  (program)  edges.  When  augmented  with  edge 
10,  the  result  specifies  that  there  exist  10  -  6  +  1  =  5  linearly 

f 

(independent  circuits.  Every  path  from  a  to  f,  when  augmented  with 

l 

iedge  10,  forms  a  circuit.  By  definition,  each  path  can  be  represented 
las  a  combination  of  any  set  of  five  linearly  independent  circuits, 


28 


which  form  a  basis  for  all  circuits.  Because  the  union  of  the  edges 
composing  the  five  linearly  independent  circuits  exhausts  the  graph 
edge  set,  a  maximum  of  five  paths  need  to  be  used  to  cover  the  graph 
edges,  even  if  each  of  the  five  paths  employs  a  different  linear  inde¬ 
pendent  circuit  in  its  path  description.  Paige  (1975)  independently 
derived  this  bound  on  the  maximum  number  of  paths  necessary  to  provide 
edge  covering. 

The  cyclomatic  number  V(G)  is  only  an  upper  bound  on  the  mini¬ 
mum  number  of  paths  necessary  to  test  every  program  branch.  If  one 
path  can  be  composed  of  all  V(G)  independent  cycles,  then  only  one 
test  data  point  is  necessary  from  edge  covering. 

Other  Structural  Tests 

Researchers  and  practitioners  generally  employ  branch  testing 
as  a  minimum  measure  of  coverage  (Brown  &  Lipow,  1975;  Osterwell  & 
Fosdick,  1976;  Miller,  1976).  It  is  often  the  case,  however,  that  a 
larger  set  of  test  data  is  employed  to  exercise  critical  or  interesting 
paths  or  to  exercise  particular  segments  under  a  variety  of  conditions 
to  assess  a  program's  reliability.  For  example,  one  test  coverage 
criterion  in  Mills  (1971)  and  Ramamoorthy,  Kim,  and  Chen  (1975)  is 
called  structured  testing  and  has  been  shown  to  compare  favorably  to 
other  test  data  selection  criteria  (Howden,  1978).  As  an  approximation 
to  full  path-testing,  structured  testing  dictates  that  all  paths  be 
tested  that  require  fewer  than  an  arbitrary  number  of  iterations  of 
any  cycle.  Variations  in  this  strategy  are  often  employed  in  cases 
where  applying  this  criterion  would  leave  part  of  a  module  untested 


because  of  complicated  interdependencies  between  an  outer  loop  index 
and  inner  loop  bounds. 

Another  popular  testing  strategy  requires  test  data  that  reside 
at  boundary  points  at  the  input  values.  Other  special  values  may  be 
submitted  to  ensure  that 

.  related  data  have  distinct  values 

certain  arithmetic  expressions  involve  zero-valued 
arguments 

•  nonnumeric  inputs  are  submitted  for  each  of  the  significant 
values  that  they  may  assume. 

Test  Data  Selection 

The  problem  of  test  data  selection  for  structural  testing  is  by 
no  means  solved  upon  the  identification  of  the  program  paths  to  be 
tested.  In  fact,  the  problem  of  generating  test  data  to  execute  any 
i  specified  statement  in  a  program  is  formally  unsol vable.  If  a  means 
can  be  found  to  ascertain  a  finite  bound  on  program  cycles  (arbitrary 
bounds  on  loops),  then  a  more  reasonable  problem  results.  The  test 
data  selection  problem  can  be  viewed  as  the  identification  of  a  data 
point  x  that  invokes  the  execution  of  the  path  p(x)  consisting  of  a 
finite  number  of  functional  statements  sequenced  by  implicit  (next- 
statement)  and  explicit  ( IT -THEN -ELSE,  DO-WHILE)  control  structures. 

i 

A  path  i6  represented  in  a  control  grapn  by  a  sequence  of 
jedges.  Recall  that  these  edges  may  be  labeled  with  conditions  that 
,  serve  as  branch  selectors  when  more  than  one  edge  emanates  from  a 
vertex.  These  conditions  parallel  the  conditions  found  in  corresponding 
program  statements  as  the  IF-THEN-ELSE,  DO-WHILE,  and  CASE  statement. 


50 


The  input  domain  I  associated  with  path  p  can  be  defined  by  the  con¬ 
junction  of  these  conditions.  This  conjunction  represents  a  collection 
iof  predicates  for  which  at  least  one  desired  test  data  point  must 
serve  as  a  solution.  In  general,  finding  this  point  is  also  an  un- 
decidable  problem,  even  if  the  conditions  are  merely  inequalities. 
Moreover,  the  conjoining  of  conditions  into  a  path  predicate  is  non- 

f 

(trivial,  as  it  requires  considering  the  effects  pf  changes  in  the 
predicate  arguments  (free  variables)  made  by  assignment  and  other  types 
of  statements  within  segments . 

The  problem  of  path  selection  (by  segment-covering,  branch- 
|covering,  or  other  method)  cannot  in  most  instances  be  divorced  from 

I 

(the  determination  of  input  data  generating  a  path.  This  is  due  to  the 

I 

fact  that  abstract  path  selection  based  upon  covering  ignores  the  con¬ 
ditions  that  must  hold  for  a  path  to  be  exercised.  Many  programs  have 
ja  large  number  of  infeasible  paths  whose  path  predicates  evaluate  to 
false.  Howden  (1978b)  believed  the  presence  of  infeasible  paths  to  be 
(the  most  serious  problem  in  path  selection.  Moranda  (1978),  in  ran- 
Idomly  exercising  a  numerical  algorithm,  found  far  fewer  paths  than 
(would  be  expected  if  each  path  were  feasible  and  equiprobable .  Many 
iof  the  designers  of  test  generation  systems  ignore  the  possibility  of 
linfeasible  paths,  preferring  to  iterate  between  path  selection  and 
, infeasibility  determination  until  a  covering  is  found  (Miller,  197^ . 
Many  articles  on  test  data  generation  merely  ignore  the  problem 
'(Hoffman,  1976;  Miller  &  Melton,  1975). 

Path  infeasibility  determination  is  a  natural  consequence  of 


path  predicate  determination  in  test  data  generators.  Test  data 


31 


generators  can  be  roughly  categorized  by 

•  the  degree  to  which  they  aid  in  path  selection,  and 

•  the  degree  to  which  they  aid  in  path  predicate  evaluation. 
RSVP,  one  of  the  earliest  such  systems,  produced  a  branch-covering  on 

I  FORTRAN  programs  and  partially  evaluated  the  path  predicate  by  folding 
constant  expressions  and  dropping  obviously  redundant  path  constraints 
(Miller  &  Melton,  1975)-  The  resulting  series  of  inequalities  (vir¬ 
tually  all  FORTRAN  predicates  involve  comparisons  of  numerical  data) 
is  printed  out  for  the  user  to  solve.  PET,  a  Program  Evaluation  Tool 
developed  by  Stucki  (1977),  instruments  a  FORTRAN  program  with  probes 
that  indicate  segment  and  branch  usage.  A  testing  system  at  TRW  Sys¬ 
tems,  Inc.,  aids  in  path  selection  and  attempts  to  evaluate  path  predi 
cates  composed  of  equalities  by  algebraic  methods  (Hoffman,  1972). 
Hence  it  appears  that  the  computing  community  is  a  number  of  years 
away  fran  developing  systems  that  generate  sets  of  nonredundant  test 
data  that  cover  all  feasible  paths. 

Symbolic  Execution 

I  ATTEST,  a  system  codeveloped  by  Clarke  (1978)  exhibits  the 

! 

j  most  sophisticated  approach  to  path  selection  and  feasibility 
j  determination.  A  user  may  select  from  a  set  of  testing  criteria, 
j  including  segment-covering,  branch-covering,  structured  testing,  and 
!  full  path  selection.  ATTEST  employs  a  linear  programming  algorithm 

I 

f 

,  to  determine  the  minimum  and  maximum  values  for  each  loop  in  the  path. 
The  traversal  of  the  control  graph  that  searches  for  paths  is  heuris- 
tically  driven  and  integrated  with  a  "solver"  routine  that  attempts  to 


evaluate  path  predicates  as  they  are  composed.  The  solver  employs 
linear  programming  methods  on  linear  predicates  and  heuristic  methods 
for  more  complex  conditions. 

In  operation  ATTEST  closely  resembles  many  aspects  of  a 
symbolic  execution  system.  A  symbolic  execution  system  is  usually 
initiated  at  the  root  (first  executable  statement)  of  a  program.  A 
set  of  input  variables  of  indeterminate  value  is  identified  (these  are 
subroutine  parameters,  COMMON  variables,  and  the  subjects  of  input 
statements).  Every  other  program  variable  value  is  maintained  as  a 
function  of  the  input  variables.  Hence  every  decision  point’s  condi¬ 
tion  can  be  reexpressed  in  terms  of  input  variables.  During  an  inter¬ 
active  session  with  a  symbolic  execution  system,  a  user  is  asked  to 
specify  which  branch  to  take  at  each  decision  point.  Each  decision 
branch  adds  one  more  decision  condition  to  a  set  that,  when  conjoined, 
specifies  the  path  predicate,  hence  the  input  conditions  that  generate 
a  path  to  the  particular  state  at  which  the  symbolic  execution  is 
stopped. 

In  EFFIGY,  a  symbolic  execution  system  developed  by  King 
(1976),  the  user  may  save  state  information  so  that  he  may  back  up  to 
an  earlier  decision  point  to  take  a  different  path  of  execution.  Simi¬ 
lar  systems  are  under  development  elsewhere  (Howden,  1977;  Bayer, 
Elspas,  &  Levitt,  1975)* 

Data  Flow  Analysis 

Although  it  falls  outside  the  definition  of  vesting,  data  flew 
analysis  is  a  powerful  validation  method  often  integrated  within  sym- 


# 


colic  execution  systems.  Data  flov  analyses  involve  the  determination 
of  vhen  program;  variables  are  referenced  and  defined  vithin  an  execu¬ 
tion  sequence.  Lata  flov  analysis  can  be  performed  during  path  enum¬ 
eration  or  symbolic  execution  by  determining,  for  each  analyzed  execu¬ 
tion  sequence,  whether  variables  are  used  prior  to  their  definition 
(assignment)  or  assigned  a  value  and  then  not  used.  The  identification 
i  of  these  and  other  data  flow  anomalies  is  the  function  of  data  flov 
[analysis  systems.  Both  the  DAVE  and  ATTEST  systems  referenced  above 
incorporate  data  flov  analyses  as  an  option  during  path  generation. 

'Software  Error  Classes 

A  number  of  researchers  have  attempted  to  categorize  program 
: error  types.  The  most  extensive  study  of  this  kind  was  conducted  by 
TRW  Systems,  Inc.  (Thayer,  Lipow,  &  Nelson,  1976),  and  included  an 
[analysis  of  error  reports  collected  during  the  development  of  four 
large  software  systems.  A  great  number  of  automated  analysis  tools 
was  brought  to  bear  on  the  systems '  source  code  and  on  the  error- 
recording  documentation  generated  during  the  course  of  systems  develop¬ 
ment.  A  very  extensive  list  of  error  classes  and  subclasses  resulted, 
including 

•  Computational  errors,  resulting  from 

— the  improper  coding  of  formulae  used  in  problem  solving 
(e.g.,  quadratic  roots  equation) 

— bookkeeping  (e.g.,  array  index  or  record  number  calculation) 

•  Logic  errors,  in  which 

— missing,  excess,  or  erroneous  conditions  were  discovered 
— functional  sequencing  was  incorrect  for  some  case 
— an  algoritnm  realized  a  function  other  than  what  was  desired 


— physical  characteristics  of  the  problem  were  misunderstood 
(e.g.,  memory  requirements,  time  duration,  device  inter¬ 
face) 

•  Input /output  errors 

— improper  format  or  timing  of  output 
— excessive  or  insufficient  results  display 

•  Data  handling  errors 

— data  structures  undefined  or  uninitialized 

— improper  data  types  used  for  an  operation 

— attempts  to  access  data  outside  of  legitimate 
address  space  or  data  structure  bounds 

•  Interface  errors,  in  which  global,  shared,  or  passed  data 
were  incompatible  or  improperly  referenced  from  routine 
to  routine,  routine  to  database,  or  system  to  external 
environment 

•  Standards  violations,  in  which  nonconformance  of  program 
text  to  coding  standards  was  discovered  or  incomplete 
documentation  was  provided. 

Because  these  ’’errors"  were  recorded  under  all  conditions  of 
improper  system  performance  or  representation,  some  of  the  designated 
srrors,  in  fact,  describe  "failures,"  including  failures  to  meet  re¬ 
quirements  and  failures  resulting  from  operator  error.  The  sources  of 
these  errors  were  attributed  to  development  phases  (requirements, 
design,  coding,  maintenance)  depending  upon  the  time  when  error- 
producing  decisions  were  first  made  (Thayer  et  al.,  1976).  The  fre¬ 
quency  of  the  major  error  categories  for  three  of  the  four  projects  is 
shown  in  Table  2.  The  category  "Other"  was  used  to  group  errors  result¬ 
ing  from  improper  system  analysis  or  design. 


35 


Table  2 


Empirical  Frequencies  of  Major  Error  Categories 


Error  Category 

Number  of 
Errors 

Percentage 
of  Code 
Based  Errors 

Percentage 
of  Total 
Errors 

Computational 

552 

12 

9 

Logic 

1,333 

29 

20 

Input/ output 

911 

20 

14 

Data  handling 

887 

19 

14 

Interface 

932 

20 

14 

Total  code  based 

4,615 

100 

71 

Standards 

391 

6 

Other 

1,470 

23 

Total 

6,476 

100 

Note.  From  Software  reliability  study  by  Thayer,  Lipow,  & 

Nelson.  Redondo  Beach,  Calif. :  TRW,  March  1976,  P* 

Rubey,  Dana,  and  Biche  (1975)  analyzed  a  collection  of  data 
obtained  from  a  variety  of  (undisclosed)  medium-scale  assembly  language 
development  efforts.  In  developing  an  empirically-based  estimating 
formula  for  validation  cost,  the  authors  reviewed  the  type  and  fre¬ 
quencies  of  errors  discovered  during  the  validation  phase.  The  error 
classes  defined  by  these  researchers  coincide  approximately  with  those 
employed  in  the  TRW  study  as  shown  in  Table  3* 


56 

Table  3 

Empirical.  Frequencies  of  Major  Error  Categories 

1 

1 

1 

j  Error  Category 

Number  of 

Errors 

Percentage 
of  Code 
Based  Errors 

Percentage 
of  Total 
Errors 

Computational 

113 

30 

9 

Wrong  operation  (69) 
Poor  scaling  (22) 
Other  (22) 

Data  access 

120 

32 

10 

Logic/ sequencing 

139 

37 

12 

Wrong  condition  (28) 
Sequencing  (98) 
Other  (13) 


Total  code  based 

372 

100 

31 

Specification 

485 

4o 

Standards  violation 

118 

10 

Documentation 

96 

8 

Total 

1,202 

100 

Note.  From  Quantitative  aspects  of  software  validation  by  Rubey, 
Dana,  &  Biche,  IEEE  Transactions  on  Software  Engineering,  June  1975; 
SE^(2), 


1  One  significant  finding  of  this  study  was  the  apparent  usefulness  of 
!  static  analysis  (e.g.,  data  flow  analyses,  cross  reference  checks, 
code  review).  The  application  of  static  analysis  techniques  resulted 
in  the  discovery  of  about  half  of  the  errors  found,  and  these  errors 
were  discovered  early  in  the  validation  effort  at  less  cost  than 
'  execution-based  testing.  Alberts  (1976)  cited  a  similar  efficacy  for 
static  analysis,  reporting  that  46  percent  of  logic  and  coding  errors 
were  detectable  by  manual  inspection  and  formal  static  methods. 


37 


j  In  a  recent  paper,  Fujii  (1977)  outlined  the  validation  activi¬ 

ties  performable  in  each  phase  of  system  development.  Her  classifica¬ 
tion  of  implementation  errors  includes  logical/branching,  data  access¬ 
ing,  and  sequencing,  defined  in  a  manner  very  similar  to  the  studies 
mentioned  above.  Fujii’ s  analysis  refers  to  medium  to  large  programs 
with  high  reliability  requirements  developed  under  contract  to  the 
Department  of  Defense. 

Very  few  data  have  been  published  on  error  frequencies  in  small 
software.  Because  sma'l 1  projects  seldom  have  the  budgetary  luxury  of 
gathering  development  statistics,  one  can  only  infer  from  intuition  i 
and  other  researchers'  comments  that  the  error  composition  for  ranal i  i 
software  is  most  heavily  concentrated  in  physical  design  and  implemen-  ' 
tation  flaws,  with  a  higher  frequency  of  computational,  logic,  and 
sequencing  errors.  Over  a  period  of  years,  Howden  (1976)  studied  the 

efficacy  of  various  testing  procedures  on  smal l  programs  written  in  1 

i 

higher-level  languages.  Each  of  the  following  testing  approaches  was  j 
employed  and  the  number  of  errors  discovered  was  tallied  for  each: 


Testing  Approach 

Path  testing 
Branch  covering 
Structured  testing 
Special  values  testing 
Interface  testing 
Anomaly  detection 
Specification-based  testing 

Total  errors 


Number  of  Errors 
Discovered 

18 

6 

12 

17 

2 

k 

1 


I 


j 


28 


38 


The  relative  effectiveness  of  path-based  and  special  values  (boundary 
points)  techniques  and  lesser  efficacy  of  formal  specification,,  inter¬ 
face,  and  anomaly-detection  approaches  supports  the  contention  that 

i 

errors  in  requirements  and  interface  definition  are  less  prevalent  in 
the  small  software  development  domain.  The  high  proportion  of  "simple" , 
implementation  errors  in  small  systems  may  justify  optimism  that  small 
projects  may  be  more  mechanically  tested  at  a  reduced  cost.  Computa¬ 
tion  errors  can  typically  be  found  by  segment-covering  approaches, 
because  any  path  Incorporating  such  an  error  may  lead  to  failure. 

Logic  errors  are  amenable  to  discovery  through  branch  and  special 
values  testing,  whereas  sequencing  errors  may  often  be  found  by  struc¬ 
tured  and  other  limited  path  testing.  Each  of  the  testing  approaches 
can  be  facilitated  by  automated  tools  providing  program  instrumenta¬ 
tion,  test  data  selection,  and  failure  detection  (Ramamoorthy  &  Ho, 
1975). 

Although  testing  serves  to  purge  programs  of  errors,  it  also 
operationalizes  the  assessment  of  reliability.  Error  discovery  has 
always  been  a  phenomenon  greeted  with  mixed  emotions  by  developers. 
Although  the  reliability  of  the  code  has  undoubtedly  been  enhanced  by 
error  removal,  one's  confidence  in  program  correctness  may  also  suffer, 
depending  upon  the  timing,  frequency,  and  circumstances  associated  with 
the  failure. 

Failure  circumstances  are  primarily  dichotomized  into  the 
differences  between  testing  and  operational  environments.  Failures 

i 

are  provoked  in  testing,  the  object  of  a  formal  search  for  unfaithful  . 
translation  of  specifications.  Failures  in  operations  are  avoided  as 


39 

much  as  possible,  however,  and  any  model  of  fault  occurrence  must  dis¬ 


tinguish  between  these  program  execution  environments. 

The  difference  between  testing  and  development  environments  is 
often  approached  by  considering  the  representativeness  of  the  test  data 
set  applied  to  a  software  package.  Littlewood  (1979)  stated  that  reli- 

f 

ability  models  have  not  yet  fully  dealt  with  the  question  of  represen-  i 
tativeness.  Thayer  et  al.  (1976)  attempted  to  address  directly  test 
representativeness  by  modifying  a  data-domain  reliability  estimator 
by  an  hypothesized  operational  profile.  Brown  and  Lipow  (1975)  pro¬ 
posed  the  use  of  the  Xs  statistic  to  measure  the  conformance  of  struc¬ 
tural  test  sets  with  an  assumed  input  distribution,  and  showed  how 
additional  test  data  points  can  help  converge  the  testing  and  opera¬ 
tional  ihput  profile.  Testing  theorists  appear  to  ignore  the  issue 
altogether,  apparently  feeling  that  guarantees  of  testing  thoroughness 
preclude  the  need  to  consider  the  distribution  of  program  use  (Howden, 
1978;  Ramamoorthy  &  Ho,  1975)- 

The  differences  in  approaches  hin^e  upon  the  theorist's  prefer¬ 
ence  for  prediction  and  fear  of  failure.  Testing  costs  are  directly 
related  to  testing  duration;  testing  duration  is,  in  turn,  dependent 
upon  the  growth  of  confidence  in  a  program  and  can  be  enhanced  by 
predictors  of  residual  error  count,  time-to-next-failure,  and  failure 
rate.  In  a  sense,  testing  proceeds  until  the  cost  of  running  without 
failure  becomes  prohibitive. 

The  situation  in  the  operational  environment  is  exactly  oppo-  | 
site.  Whereas  the  deferral  of  error  correction  to  the  operational 
phase  may  lead  to  increased  debugging  costs,  it  is  the  cost  of  failure 


ko 

that  motivates  formal  testing.  Littlevood  (1979)  proposed  that  we  are 
currently  in  the  second  stage  of  reliability  model  development,  the 
fourth  of  which  will  explicitly  include  operational  failure  costs  in 
the  lifecycle  costs  that  affect  development  decisions.  He,  as  well 
as  other  researchers,  appreciates  the  problems  of  failure  cost  predic¬ 
tion.  Unlike  errors,  whose  cost  of  removal  depends  upon  their  subtlety 
and  complexity,  the  cost  of  failure  depends  greatly  upon  the  circum¬ 
stances  of  program  use.  In  real  development  environments  the  possibil¬ 
ity  of  costly  failures  is  considered  and  often  motivates  the  systematic 
testing  of  infrequently  used  program  functions.  Viewed  in  this  context 
thorough  nonrepresentative  testing  becomes  a  reasonable  approach  to 
risk  aversion,  even  though  the  resulting  reliability  estimators  may 
poorly  predict  testing  duration  and/or  operational  failure  rate. 


Testing  Cost  and  Efficacy 


The  Testing  Problem 


Software  developers  are  frequently  faced  with  the  problem  of 
determining  how  much  testing  should  be  performed  on  a  software  good. 
One  may  attempt  to  end  up  with  the  same  cost  distribution  per  develop- 

Iment  phase  as  that  recommended  in  the  literature  (Zelkowitz,  1978; 

| 

I  Boehm,  1973) >  tut  these  percentages  vary  a  great  deal  and  are  usually 
j applicable  only  to  large,  well-organized  development  teams  working  on 

i 

jlarge,  critical  systems.  Most  developers  still  determine  a  testing 
•budget  by  informal  estimation  based  upon  experience  with  similar  sys¬ 
tems.  As  testing  tools  and  strategies  become  more  abundant,  test 


planning  becomes  more  complex,  for  the  cost  and  efficacy  of  alternative 
approaches  must  be  considered. 


A  variety  of  phase -dependent  techniques  has  been  discussed  in 

i 

i previous  sections.  Although  early  validation  of  requirements  and 
!  design  is  an  important  and  worthwhile  activity,,  this  section  focuses 

i 

ion  the  decision  making  in  the  code-testing  phase.  Code  testing  is  a 
i  development  step  that  no  software  developer  can  ignore,  and  so  an 

i  analysis  of  the  alternatives  available  in  this  phase  is  expected  to 

i 

I  have  the  most  general  applicability. 

Code  testing  guidelines  are  particularly  useful  to  the  develop- 
!ers  of  small-  to  medium-scale  software  goods.  The  requirements  for 
1  small  software  are  usually  well-defined  and  the  moderate  budgets  alio-  . 
cated  for  these  projects  often  preclude  the  use  of  formalisms  in  organ¬ 
ization  and  representation  that  admit  of  modern  validation  techniques. 

I 

Testing  strategies  and  techniques,  on  the  other  hand,  are  better  j 

understood,  generally  more  intuitive,  and  span  a  wide  range  of  sophis-  | 
tication  that  permits  the  developer  to  choose  a  degree  of  formality 
suitable  to  the  importance  of  the  system  and  its  reliability  require¬ 
ments. 

Testing  costs  can  be  categorized  by  cons' ’ering  the  steps 
| involved  in  code  validation.  The  transition  from  design  to  program 
code  results  from  decisions  including  choice  of  programming  language, 

; algorithms  and  data  structures,  and  target  machine.  Program  valida¬ 
tion  procedures  c eminence  with  examination  of  the  coded  program  as 
j represented  in  a  human  (coding  sheet)  or  machine-readable  form  (cards, 
disk  file).  Syntactic  validation  is  necessarily  performed  by  a 
translator  (compiler,  interpreter)  invoked  to  reduce  the  program 
text  to  an  executable  fora.  A  translator  reports  on  syntactic  correct- 


42 


Iness  by  determining  the  program's  conformance  to  the  grammar  associated 

I 

! vith  the  programming  language.  Although  sophisticated  translators  can 
.perform  many  types  of  static  analyses  to  assess  semantic  correctness, 

:  it  is  assumed  here  that  separate  tools  are  required  to  perform  data 
,  flow  analysis,  reachability  analysis,  and  other  static  validation  pro- 
i  cedures. 

j Sampling  Strategies 

1  When  a  program  is  free  of  syntactic  errors,  it  remains  to  be 

! 

tested  for  operational  conformance  to  its  desired  function.  Testing 
involves  the  application  of  data  and  this  application  involves  the 
selection  of  representative  inputs.  This  choice  may  be  termed  a 
sampling  strategy  because  it  requires  the  selection  of  one  or  more 
data  points  from  the  (generally)  large  set  of  possible  program  inputs. 
In  the  discussion  in  previous  sections,  the  set  of  possible  data  inputs 
has  been  designated  by  the  set  D.  Each  element  of  D  is  a  collection 
of  inputs  that  forces  a  program  P  to  proceed  from  an  initial  state  to 
one  or  more  states  that  signify  completion.  A  sequence  of  program 
j  actions  from  initiation  to  termination  is  considered  an  execution  or 

I 

;  run  and  serves  as  the  basis  of  analysis  for  correctness. 

In  some  programs  or  systems  this  unit  of  execution  may  be  dif¬ 
ficult  to  assess:  real-time  code  may  continuously  monitor  and  react  to 
!  environmental  stimuli;  query  systems  may  interact  continuously  vith  a 
1  variety  of  users;  operating  systems  may  continuously  provide  system 
services  while  controlling  access  to  system  resources.  In  nearly  all 
cases,  however,  a  run  or  complete  execution  can  be  defined.  Program 


43! 

terminations  serve  as  obvious  boundaries  for  simple  programs,  whereas 
runs  in  continuous  systems  may  be  defined  as  execution  sequences  be¬ 
tween  transaction  submittals  or  service  requests. 

Discovering  a  relevant  definition  for  program  inputs  may  also 
prove  elusive.  One  may,  however,  simply  consider  the  time-ordered  data 
values  submitted  to  a  program  during  a  run  and  designate  each  sequence 
of  input  data  values  d  =  (d^d2,---)  as  an  element  of  the  input  domain, 
D.  In  simple  systems  the  ordering  of  the  d  's  is  unimportant,  where, 
for  example,  each  d  represents  a  separate  value  and  the  entire  set  is 

submitted  simultaneously  at  program  start.  A  program  may,  however, 

k.  j 

acquire  inputs  during  the  course  of  execution  and  sequences  of  d  's 

l 

may  be  highly  interdependent.  I 

The  cardinality  of  D  is  dependent  upon  the  data  types  of  the  j 
k  I 

d  's  and  the  relationships  among  them.  In  simple  programs  each  con-  i 

k  i 

stituent  d  of  an  input  data  point  d  may  be  a  separate  variable  of  a 

i 

common  data  type  (say,  integer  or  character).  The  cardinality  of  D, 
then,  is  at  most  the  product  of  the  cardinalities  of  each  associated 
data  type,  where  data  type  indicates  a  set  of  permissible  variable 
values.  This  maximum  cardinality  is  often  bounded  by  interrelation¬ 
ships  among  program  inputs.  For  example,  in  matrix  multiplication 
the  input  matrices  must  be  conformable  (L  by  M  and  M  by  N)  to  be 
proper  inputs. 

Other  bounds  may  be  placed  upon  individual  input  values  as 


well  as  permissible  combinations.  A  programming  language  may  provide 
only  general  data  types  (for  example,  integer)  when  an  input  value  is 
specified  to  reside  in  a  restricted  range  (say,  gross  income  in  a  tax 


44 


computation  routine).  One  must  recognize  the  collection  of  constraints 
on  input  values  as  this  collection  is  important  in  determining  the 
permissible  execution  paths  through  the  program. 

To  reiterate,  a  sampling  strategy  requires  the  selection  of 
sample  points  for  application  to  a  program  P  to  attempt  computation 
of  the  program  function  f.  Informal  sampling  strategies  abound,  in¬ 
cluding  pseudorandom,  functional,  descending-criticality,  and  most- 
! obvious-first.  A  testing  procedure  is  considered  pseudorandom  if  the 
'selection  of  test  data  is  performed  in  a  manner  in  which  no  methodology 

i 

is  apparent,  and  for  which  it  appears  that  input  sets  are  selected  from 

I 

,D  with  equal  probabilities.  True  random  sampling  may  be  performed, 
jof  course,  by  employing  a  mechanism  to  choose  from  D  nondeterministi- 
ically  with  the  resulting  sample  d  distributed  according  to  the  mechan¬ 
ism's  selection  function. 

Functional  testing  results  from  the  application  of  test  sets  to 
jdetermine  the  conformance  of  the  program  to  its  specifications.  When 

I 

:the  specifications  imply  sane  easily  recognizable  classes  of  desired 
jprogram  behavior,  functional  sampling  attempts  to  confirm  the  correct¬ 
ness  of  the  program  functions  associated  with  the  desired  behavior.  A 
'partitioning  of  the  input  domain  is  usually  suggested  by  the  set  of 
;desired  functions  (say,  different  transaction  types  in  a  bank's  teller 
isystem)  and  functional  testing  requires  a  selection  of  data  points 
within  each  partition.  The  selection  within  partitions  may  be  per¬ 
formed  by  any  other  sampling  technique. 

Sampling  in  order  of  decreasing  criticality  refers  to  tne 
informal  notion  of  testing  those  program  functions  whose  failure. 

i 

s 


would  preclude  further  testing,  or  which,  if  found  incorrect,  could 
prove  costly.  As  an  example  of  the  former  case,  a  frequently  used  sub¬ 
routine  may  be  validated  first  to  allow  a  confident  assessment  of  error 
types  in  other  program  parts  that  employ  that  routine  (this  is  the  moti¬ 
vation  for  bottam-up  testing).  The  latter  case  refers  to  a  test  order¬ 
ing  in  which  a  preference  function  is  imposed  upon  the  kinds  of  errors 
(allowable  in  the  program.  One  may  assess  the  need  for  extensive  testing 
of  life  support  software  for  a  space  capsule,  but  commit  fewer  resources 
to  testing  less  critical  functions  (e.g.,  caloric  intake  monitoring). 

Closely  associated  with  sampling  by  decreasing  criticality  is 

{ 

|the  modeling  of  sampling  by  removal  of  the  most  obvious  errors  first. 

Iln  practice,  a  large  number  of  errors  is  discovered  in  initial  testing 
phases,  with  error  discovery  decreasing  as  time  goes  on.  This  phenome¬ 
non  results  because  some  error  types  are  more  obvious  than  others:  for 
example,  those  involving  a  single  statement  rather  than  combinations 


jof  segments.  Sets  of  related  errors  made  uniformly  throughout  the 

i 

[program  (e.g.,  improper  declarations  or  subroutine  calls)  also  tend  to 
[be  discovered  early  and,  depending  upon  the  error  discovery  model,  may 
[be  counted  as  a  large  number  of  single  errors.  Time-domain  models 

I 

that  employ  a  decreasing  hazard  function  can  account  for  this  phenome¬ 
non  (see  Chapter  III). 

Formal  sampling  strategies  require  a  stratification  of  the  data 
jdomain.  Structural  testing  involves  a  stratification  with  partitioning 
'induced  by  the  structure  of  the  program.  Structural  testing  is  com¬ 
posed  of  three  st-eps?  structure  determination,  partitioning,  and  test 


data  selection.  Structure  determination  involves  the  analysis  of 


h6 

possible  program  execution  paths  and  is  normally  performed  by  con¬ 
structing  a  graph  that  indicates  program  segments  and  precedence  rela¬ 
tionships  among  them.  Partitioning  involves  the  recognition  of  dis¬ 
joint  subsets  of  D,  each  of  which  is  associated  with  seme  aspect  of 
program  structure.  Segment  covering  is  one  such  sampling  criterion 
in  which  the  desired  test  data  set  must  exercise  every  program  state¬ 
ment.  Branch  covering  requires  that  a  set  of  data  invoke  the  transfer 
between  every  connected  segment  pair.  Structured  testing  requires  the 
execution  of  every  basic  path. 

The  usual  outcome  of  partitioning  is  a  specification  of  the 
set  of  paths  that  satisfy  a  covering  criterion.  With  each  path  is 
associated  a  path  predicate  characterizing  the  property  of  the  domain 
subset  whose  members  force  the  path's  execution.  To  operationalize 
these  partitionings,  one  or  more  data  points  must  be  defined  for  each 
subset.  As  discussed  above,  this  determination  may  be  nontrivial, 
requiring  the  solution  of  the  path  predicate  in  terms  of  the  input 
variables. 

Sampling  Cost  and  Effectiveness 

The  costs  and  benefits  of  test  data  selection  arise  in  two 
general  ways:  sampling  cost  and  sampling  efficacy.  Sampling  cost 
relates  to  the  time,  effort,  and  money  expended  in  constructing  a  test 
data  set  conforming  to  specified  testing  criteria.  For  random,  pseudo¬ 
random,  or  functional  testing  this  cost  may  be  minimal  because  these 
methods  are  relatively  intuitive.  Moreover,  inexpensive  test  tools 
exist  to  aid  in  these  simple  forms  of  test  data  construction  (Reefer  A 


1*7 

Trattner,  1977)-  The  cost  of  sampling  subject  to  structural  testing 
constraints  is  greater  because  it  involves 

the  construction  of  a  program  graph 

the  selection  of  paths  providing  the  desired  test  coverage 

the  determination  of  path  feasibility  by  definition  of  data 

points  forcing  each  path's  execution. 

The  cost  of  graph  construction  if  basically  linear  in  the 
number  of  statements,  although  manual  construction  requires  a  slightly 
higher  order  of  effort  for  large  and  cumbersome  graphs  (Aho  &  Ullmann, 
1975)*  Path  selection  time  is  also  proportional  to  the  number  of  seg¬ 
ments  or  branches  for  their  respective  coverings,  and  often  is  immedi¬ 
ately  obvious  in  manual  analysis  of  small  graphs  (Gabov,  M&heshvari,  Se 
Osterveil,  1976)*  Basic  path  selection  is  also  linear  in  the  number 
of  branches,  although  more  difficult  to  perform  manually  (Paige,  1975). 
More  complex  criteria  may  require  execution  time  that  is  a  polynomial 
function  of  the  segments  or  branches.  Full  path  testing  is  generally 
impractical  for  all  but  the  smallest  of  programs. 

Feasibility  analysis  and  data  point  selection  are  formally  un- 
solvable,  but  some  current  systems  do  perform  this  function  for  path 
predicates  of  special  forms.  Path  predicate  composition  is  linear  in 
the  number  of  statements  on  the  path.  Path  predicate  solution  depends 
upon  the  efficacy  of  the  routine  employed  for  feasibility  analysis  and 
depends  upon  (at  least)  the  number  and  type  of  path  conditions  (con¬ 
straints). 

Sampling  effectiveness  is  related  to  the  "speed"  with  which  a 


sampling  technique  discovers  errors.  The  application  of  different  test 


k& 

points  is  clearly  more  efficient  than  that  of  simple  random  test 
points,  because  duplication  is  precluded  in  the  former.  In  a  number 
of  studies,  Hovden  (19?6,  1978)  compared  the  observed  relative  effica¬ 
cies  of  various  test  criteria  (segment  covering,  branch  covering, 
structured  testing,  full  path  testing)  but  failed  to  give  any  measure 
of  the  variability  of  each  criterion's  error  discovery. 

The  efficacy  of  a  particular  data  set  selection  approach  should 
be  reflected  in  the  reliability  model  employed  in  assessing  program 
correctness.  More  effective  sampling  strategies  should  increase  the 
detection  rate  in  time -domain  models  as  reflected  in  the  hazard  func¬ 
tion.  Data-domain  models  should  show  faster  reliability  growth  and/or 
faster  reliability  estimate  convergence. 

Cost  of  Test  Data  Application 
and  Fault  Detection 

Associated  with  each  test  data  point  or  set  are  the  costs  of 
its  application  and  results-assessment .  A  test  run  may  be  a  simple 
procedure  in  which  initial  data  are  input  interactively  or  a  complicated 
procedure  in  which  test  drivers  are  required  to  simulate  environmental 
inputs.  In  either  case,  it  is  probably  reasonable  to  assume  that  each 
test  case  Incurs  a  fixed  cost  of  application,  although  this  cost  may 
vary  greatly  frcm  system  to  system. 

To  be  tested,  a  data  point  must  invoke  program  execution,  which 
requires  computing  resources.  Many  programs  require  computing  re¬ 
sources  (CPU  time,  memory)  in  quantities  related  to  the  values  of 
input  values.  The  time  and  space  efficiency  orders  of  tested  programs 
dictate  this  resource  use  and  hence  impose  a  cost  of  test  execution 


49 

dependent  upon  the  values  of  test  data  input.  Test  data  values  (d  ’ e) 
that  affect  data  structure  size  (e.g.,  matrix  order)  or  loop  bounds 
(e.g.,  convergence  tolerance)  can  vastly  affect  the  cost  of  test  data 
application.  In  practice,  running  times  appear  to  be  the  more  costly 
variable.  Running  time  (elapsed  or  CPU)  is  directly  related  to  the 
length  of  the  execution  path  associated  with  each  data  point. 

Fault  detection  involves  the  examination  of  test  results  to 
confirm  correctness.  The  assessment  of  output  correctness  requires  an 
independent  evaluation  of  the  program's  function  on  the  test  input  by 
vhat  is  termed  a  test  oracle.  Fault  detection  is  easily  performed  for 
simple  programs  whose  function  is  in  some  sense  invertible  (say,  a 
square  root  routine),  and  can  be  performed  inexpensively  by  a  human 
oracle.  Output  from  large  complex  systems  is  generally  much  more 
difficult  to  assess  and  often  requires  a  simulation  or  execution  of 
another  similar  program  for  output  validation.  It  Is  probably  reason¬ 
able  to  assume,  however,  that  the  cost  of  a  test  oracle  is  linear  in 
the  number  of  test  data  points.  Where  correctness  can  be  determined 
only  by  comparison  with  a  similar  system  the  cost  of  detection  may 
resemble  that  of  test  execution,  but  in  many  cases  correctness  of  an 
output  can  be  determined  at  a  cost  independent  of  the  program  time  and 
6pace  necessary  for  its  computation. 

Cost  of  Error  Detection 
and  Correction 

Once  a  test  input  is  found  to  cause  premature  program  termina¬ 


tion  or  generation  of  improper  output,  a  search  proceeds  for  the 
cause(s)  of  the  fault.  These  causes  are  normally  termed  errors  and 


50 


their  detection  normally  requires  code  review  to  determine  the  failure 
source(s).  Error  detection  is  still  the  most  artful  of  programming 
practices,  depending  upon  deep  knowledge  of  the  program,  vagaries  of 
the  programming  language  employed,  and  other  subtleties.  It  is  prob¬ 
ably  less  than  realistic  to  assume  that  error  detection  costs  are 
uniform  over  an  program  inputs.  In  the  same  way  that  prevalent  fail¬ 
ures  occur  early  in  testing,  obvious  errors  are  generally  found  first. 
In  fact,  part  of  the  appeal  of  decreasing  hazard  function  time-domain 
models  is  the  way  in  which  this  phenomenon  can  be  represented. 

Error  correction  involves  the  rectification  of  program  defi¬ 
ciencies  by  changing  the  text  to  reflect  properly  the  functional  speci¬ 
fications.  Error  discovery  is  closely  associated  with  error  correction 
and  it  is  usually  possible  to  integrate  these  costs  as  one  function. 
Textual  changes  are  normally  followed  by  retranslation  of  the  source 
to  object  program.  Retranslation  costs  are  usually  polynomial  in  the 
number  of  statements  and,  in  most  cases,  linear.  Retranslation  costs 
and  turnaround  times  are  often  sufficiently  great,  however,  that  col¬ 
lections  of  test  inputs  are  often  batched  so  that  a  number  of  correc¬ 
tions  can  be  made  between  test  submittals. 

The  Cost  of  Errors  in 
the  Operational  Phase 

The  costs  associated  with  program  errors  that  persist  into  the 
operational  phase  can  be  dichotomized  into  two  general  categories:  the 
cost  of  failure  and  the  cost  of  maintenance.  The  cost  of  failure  is 
the  prime  determinant  of  reliability  requirements  and  can  include 


opportunity  losses,  organizational  disruption,  and  other 
effects  of  system  service  suspension 

financial  loss  through  improper  accounting,  over-  or  under¬ 
utilization  of  resources,  improper  purchases  or  sales,  or 
other  real  loss 


•  Jeopardizing  of  human  life. 

Che  consequences  of  program  failure  to  the  developer-user  are  the 
lirect  kinds  enumerated  above.  The  software  supplier  may  incur  many 
indirect  costs  when  a  product  is  found  to  be  unreliable,  including 

•  loss  of  sales  as  a  result  of  product  reputation  for 
unreliability 

.  litigation  and  settlement  costs  incurred  as  the  result  of 
program  misperf ormance . 

Both  users-developers  and  software  suppliers  generally  incur 
the  cost  of  maintenance.  The  developer  may  or  may  not  have  the  respon¬ 
sibility  to  maintain  customer  versions  of  a  product,  but  must  acknowl¬ 
edge  errors  reported  by  users  and  correct  them  if  continued  sales  of 
the  product  are  desired.  A6  an  approximation  to  reality,  one  may 
assume  that  all  of  these  costs  can  be  integrated  into  a  simple  function 
of  the  number  of  failures.  Each  failure  incurs  a  fixed  cost  represent¬ 
ing  the  damage  caused  by  the  misperf ormance,  and  a  maintenance  cost 
corresponding  to  sane  multiple  of  the  error  detection  and  correction 
costs  of  testing.  These  costs  are  usually  a  function  of  the  number  of 
remaining  errors,  ng,  but  not  necessarily  linear  in  ng  (for  time-domain 
nodels  other  than  exponential) . 


No  one  has  yet  proposed  a  model  which  would  dictate  the  type 


and  duration  of  testing  depending  upon  the  costs  of  sampling,  applica¬ 
tion,  error  detection/correction  and  program  failure  outlined  above. 
Most  researchers  would  agree  with  Littlewood  (1979)  that  it  is  prema¬ 
ture  to  attempt  such  a  model  before  the  computing  community  has 
reached  a  consensus  on  the  proper  mix  of  tools  and  approaches  for  dif¬ 
ferent  software  development  situations.  In  the  sections  above,  the 

I 

cost  order  of  magnitude  for  each  testing  for  each  testing  activity  and 
result  has  been  inferred.  In  this  study  experimental  results  will  be 
reported  in  hopes  of  laying  the  groundwork  for  an  economic  model  of 
testing.  We  believe  that  such  an  economic  model  i6  within  the  construe 
tion  capabilities  of  the  Software  Reliability  community  and  we  offer 
this  study  as  a  step  toward  its  resolution. 


CHAPTER  III 


SOFTWARE  RELIABILITY  THEORY  AND  PRACTICE 

I 

Introduction 

Software  reliability  has  been  a  topic  of  much  study  and  many  j 
publications  during  the  current  decade.  Software  reliability  relates  ! 

I 

to  the  ability  of  a  software  system  to  perform  as  expected  and  is  often! 
associated  with  the  degree  to  which  embedded  errors  can  cause  system  J 
failure  or  mlsperf ormance .  In  a  recent  paper,  Schick  and  Wolverton 
(1978)  reviewed  competing  models  proposed  to  relate  system  reliability 
to  program  errors  and  failure  distributions.  These  approaches  are 
generally  based  upon  differing  models  of  the  probability  distributions 
underlying  fault  occurrence  and  error  frequencies  as  functions  of  sys¬ 
tem  structure  and  use. 

All  software  reliability  models  recognize  a  relationship  be¬ 
tween  the  existence  of  embedded  errors  and  the  time-distributed  fail¬ 
ures  that  occur  when  such  errors  are  "discovered"  during  program  execu-J 

i 

tion.  Schick  and  Wolverton  dichotomized  modern  approaches  to  software  ; 
reliability  assessment  as  either  time -domain  or  data- domain  models.  1 
Time -domain  models  emphasize  the  failure  distribution  by  hypothesizing  ' 
error  discovery  as  a  function  of  some  measure  of  time.  Differing  mea¬ 
sures  of  time  include  the  execution  duration  of  the  program  (CPU  time), 
real  (calendar)  time  intervals,  and  the  number  of  separate  applications 
of  input  to  the  program. 


rate  (z(t)=a)  leads  to  a  standard  exponential  distribution 


55 


f(t)  =  ae  "0tt  R(t)  =  e“0!t  t  >  0;  a  >  0. 

Assumptions  of  increasing  or  decreasing  error  rate  (modeling  error 
correction)  can  be  expressed  by 

z(t)  =  apt^1  t  >  0;  a,p  >  0 


yielding  a  two-parameter  Weibull  distribution  for  the  error  inter¬ 
arrival  density 

f(t)  =  aptp“1.e~atp  t  >  0;  a,p  >  0 

and  reliability  function 

R(t)  *  e^1  t  >  0;  0,0  >  0  . 

Early  Models  of  Finite 
Error  Content 

If  error  discovery  rate  is  proportional  to  the  number  of  re¬ 
maining  embedded  errors,  then  a  more  representative  model  of  time- 
varying  reliability  may  be  proposed.  Assuming  that  the  likelihood  of 
detection  of  each  of  the  ng  errors  is  independent,  then  the  error  dis¬ 
covery  rate  for  the  i^*1  error  may  be  written  as  the  constant  function 


^(t)  =  a±  =  <P(ne  -  (i  -  1)) 


where  9  Is  the  instantaneous  discovery  rate  for  one  error.  Jelinski 
and  Moranda  (1973)  proposed  this  model  designating 


fi<V  • 


a  e 
1 


-Oiti 


as  the  interarrival  time  density  function  for  time  intervals 


tj.>  t2,  ••*,  tpc  between  successive  error  discoveries. 

Schick  ana  Wolvert on  (1972)  ventured  the  assumption  that  the 
longer  the  time  period  from  the  last  error  discovery,  the  greater  the 
likelihood  of  discovery.  A  f ormalization  of  this  notion  is  the  error 
rate  function  written  as 


z±(t)  =  <p(ne  -  (i  -  l))t 


where  <p  is  the  original  instantaneous  discovery  rate,  t  is  the  time 
since  the  last  error  discovery,  and  ng  and  i  are  defined  as  above. 
The  associated  interarrival  density  and  reliability  functions  are 

■z..(tj 


IT 


B1(t1)  =  e 


t±2 

-<p[ne  -  (i  -  1)]  -f- 


Both  the  Schick-Wolverton  (1972)  and  Jelinski-Moranda  (1973) 
models  may  be  generalized  realistically  to  admit  the  possibility  that 
more  than  one  error  is  found  during  testing  intervals.  Thayer  et  al. 
(1976)  proposed  extensions,  as 


Jelinski-Moranda:  z.(t.)  =  a.  =  (p(n  -  n .  , ) 

i  i  i  e  i-1 

Schick-Wolverton:  z  (t.)  =  cp(n  -  n,  ,)t^ 

V  i'  '  e  i-1'  i 

where  n^  represents  the  cumulative  number  of  errors  found  prior  to  the 
i^k  testing  interval  and  n0  *  0. 


As  stated  by  Schick  and  Wolverton  (1978),  "the  reliability 


analyst  should  expect  his  time-domain  model  to  be  a  predictor  for  both 
the  number  of  errors  remaining  and  the  mean  time  for  the  next  error  to 
occur"  (p.  2).  The  time-domain  models  cited  above  can  be  employed  in 
these  regards  e^ter  some  testing  has  taken  place,  if  estimators  can  be 
iused  in  place  of  unknown  parameters.  Thayer  et  al.  (1976)  derived  or 
cited  maximum  likelihood  estimators  (MLEs)  for  the  Q's  and  ng  above, 
as  well  as  developing  asymptotic  variances  and  correlation  coefficients 

i 

for  these  parameters.  The  general  procedure  for  determining  the  MLEs 
involves  the  simultaneous  solution  of  two  equations  in  Q  and  ng  (or  cp 
and  ng).  Approximate  confidence  regions  for  the  parameters  may  then 
be  constructed  using  the  asymptotic  variances. 

Goel's  Model 

In  a  recent  paper,  Goel  (1978)  proposed  a  time-domain  model  in 
which  failure  interarrival  times  were  considered  the  result  of  a  non- 
homogeneous  Poisson  process  (NHPP).  Goel  hypothesized  a  continuous 
function  n(t)  giving  the  cumulative  number  of  software  failures  occur¬ 
ring  by  time  t  expended  during  validation.  Because  each  failure  is 
assumed  to  be  the  result  of  a  unique  error,  and  because  each  error  is 
| assumed  to  be  corrected  subsequent  to  failure,  n(t)  is  a  nondecreasing 
| function  bounded  by  ng,  the  total  number  of  errors  in  the  program.  By 

l 

| assuming  that  the  number  of  errors  detected  in  the  interval  (t,t  +  &t) 
i is  proportional  to  the  number  of  undetected  errors,  Goel  deduced  that 

n(t)  =  ne(l  -  e_9t) 

with  9  the  constant  of  proportionality. 


56 


Goel  (1978)  imposed  the  Poisson  postulates  upon  the  random 
variable  n(t),  as 

.  n(O)  =  0 

each  t1}  t2,  •••  is  statistically  independent, 

l 

vhere  t^  =  t;  n(-r )  =  i 

•  Pr  (2  or  more  events  in  (t,t  +  h))  =  o(h) 

•  Pr  (exactly  one  event  in  (t,t  +  h))  =  A(t)h  +  o(h) 
to  derive  a  Poisson  distribution  for  n(t),  as 

,.vk  -m(t) 

fn(t)<k)  *  *  fc)  *  SlillT - 

t 

for  m(t)  =  oJ  X( s )  ds  . 

The  mass  function  f~^(k)  ^as  m(t),  vhich  Goel  chose  to  equate  to 

the  assumed  deterministic  function  n(t)  as: 

®(t)  =  ne(l  -  e"?t  )  . 

To  summarize  the  assumptions  given  above,  the  time  distribution 
of  failures,  n(t),  is  assumed  to  be  a  member  of  the  family  of  functions 
(n(t;ng,<p)3 •  In  the  absence  of  knowledge  regarding  the  values  for  ng 
and  X,  a  Poisson  prior  distribution  for  n(t)  is  given  as  f~(t)(k)  with 
mean 

m(t)  =  Qj  X(s)  ds  =  ng(l  -  e-CpT) 

J 

[to  any  given  time  t.  For  a  given  ng  and  cp  one  obtains 


Pr(n(oo)  s  k)  = 


r  -e 
e 


a  Poisson  distribution  for  the  number  of  failures  over  a  debugging 
interval  of  indefinite  length. 

Letting  n(t)  indicate  the  number  of  errors  remaining  at  time  t, 
one  obtains  mean  and  variance 


E[n(t)]  =  E£n(co)  -  n(t)]  =  ng*e 


Var[n(t)]  =  ng  +  ng(l  -  e  )  -  2ng(l  -  e  )  . 


Moreover,  if  y  is  the  number  of  errors  found  by  time  s,  then  the  condi- 
tional  distribution  of  n(s),  given  this  information,  is 

Pr(n(s)  =  k|n(s)  =  y)  =  Pr(n(®)  =  y  +  k) 

a  <i'+k) 
e _ 

(y  +  k!) 


with  mean 


Etn(s) |n(s)=y3  =  n  -  y  . 


The  reliability  of  the  program  over  time  t,  starting  at  this  time  s,  is 
given  by 

.  _cps  -cp(s+t), 

-n  le  -  e  ' ] 

R(t)  =  e  e 

As  a  result  of  choosing  a  nonhomogeneous  Poisson  process  to 
describe  n(t),  Goel  (1978)  was  able  to  derive  maximum  likelihood  esti¬ 
mators  for  parameters  ng  and  tp.  Given  a  sample  of  observed  validation 
times  and  associated  cumulative  error  counts,  T  =  l(yi,t1) , (ya,t2, • * • , 
l(yk>tk))>  these  estimators  can  be  calculated.  The  independence  cf _ 


6o 


n(t^)'s  permits  the  joint  mass  functions,  Pr(n(ti)=y1,n(is)=ye,  • . •, 
n(tk)=y^.)  to  be  computed  as  the  product  of  the  marginal  distributions 
Pr(n (ti)=yi;ne,cp),  yielding  likelihocc  function  L(n£,c)  with  global 
maximum  L*(ne,9;T)  occurring  where 

-<Ptk 

ne(l  -  e  )  =  yk 


anc 


n  t,  e 
e  k 


-9tk 


I 

l£i§k 


-«Pti  -^i-l\ 

(y,  -  ynJ^e  - 1  e  ) 


-(pti  -'pti-l 
e  -  e 


Goel  provided  numerical  solutions  for  two  sets  of  data  that  provide  a 
reasonable  fit  to  the  empirical  failure  times.  By  invoking  asymptotic 
normality  for  the  maximum  likelihood  estimators,  Goel  computed  a  co- 
variance  matrix  for  (ng,9)  and  provided  a  time-varying  estimate  and 
confidence  band  for  the  actual  number  of  errors,  n(<»),  and  the  number 
of  remaining  errors,  7I(t).  The  actual  error  predictor  n(t)  appears 
quite  good  (as  it  appearVon  the  reported  plots),  whereas  the  remaining 

error-predictor,  Tl(t),  provides,  a  reasonable  fit  to  the  empirical 

\ 

results  for  all  but  large  values  of,  t .  This  underestimation  of  n(t) 


for  large  t  would  tend  to  justify  Littlewood  and  Verrall's  (1975)  claim  ' 

that  exponential  models  of  failure  interarrival  times  are  deficient  i 

I 

near  the  end  of  the  validation  process.  j 

A  Bayesian  Approach  [ 

i 

A  number  of  investigators  have  proposed  reliability  models  in  i 
which  the  distributions  of  interest  (failure  interarrival,  reliability, 

i 

etc.)  are  updated  to  reflect  the  information  gained  during  program 


61 


testing  (Littlewood  &  Verrall,  1973;  Goel  &  Okumoto,  1978).  In  a 
recent  work,  Littlewood  (1979)  argued  that  a  proper  measure  of  program 
reliability  is  performance-orinted,  and  that  the  distribution  of  future 
failures  calls  for  a  subjectivist's  view  of  this  nondeterministic 
phenomenon. 

Littlewood  (1979)  offered  a  model  in  which  the  source  of  uncer¬ 
tainty  regarding  failures  is  dichotomized  between  the  nondeterminism  of 
program  inputs  and  the  probability  that  an  input  leads  to  failure. 
Littlewood  hypothesized  an  input  set  D  and  regarded  the  subset  De  that 
leads  to  program  failure  as  random,  because  different  software  develop¬ 
ment  efforts  lead  to  different  partitionings  of  D  into  correct  and 
failure-producing  input  subsets,  S6  and  5°. 

•wg 

As  a  program  is  debugged  D  changes,  as  denoted  by  the  sequence 


g  e 


~  e 


corresponding  to  the  failure-producing  input  sets  associated  with  each 

program  version.  Associated  with  each  D  e  is  a  failure  rate,  A,,  which 

J  J 

Iserves  as  a  parameter  in  the  time-to-next-failure  distribution 


-Vj 


Littlewood  (1979)  offered  the  gamma  distribution  as  an  appropriate  fam¬ 
ily  for  modeling  uncertainty  regarding  A,, 

0 


a-i  -Pj\j 

Pj  A]  e 

W  *  -rio 


The  mixing  of  these  two  distributions  yields  a  member  of  the  Pareto 
family  of  distributions: 


62 


Qg 


0: 


[t.  +  b]q+1 

J  J 


I 

i 

i 


The  parameter  3^  serves  as  a  "growth  function"  of  reliability 
over  discrete  time  measured  by  j,  and  Littlewood  (1979),  unlike  other 
researchers,  suggested  that  this  reliability  growth  is  linear  neither 
in  time  nor  in  the  number  of  discovered  errors.  Rather,  he  suggested 
that  most  serious  errors  are  detected  early.  This  phenomenon  can  be 
modeled  in  the  following  way.  Consider  the  program  with  K  initial 
bugs,  subjected  to  testing  in  which  error  correction  (bug  removal) 
occurs  with  certainty.  Assuming  that  the  independent  failure  rate  vi 
of  each  bug  is  identically  distributed  with  mean  v,  then  the  failure 
rate  between  successive  error  discoveries  is  given  by 


W  Vi + 


+  v. 


N  ' 


Letting 


A.  = 


j  ISIS j  “i 


jdenote  the  cumulative  time  to  the  failure,  then  one  may  express  the  j 

(Conditional  pdf  of  each  undiscovered  bug’s  error  rate,  v  ,  as 

J 


f(v  bug  undiscovered  in  internal  (C^Aj.^)) 


63 


Pr(bug  not  eliminated  in  time  interval  (OjAj^)  v)  f(v) 
r00 

oJ  Pr(bug  not  eliminated  in  time  interval  (0, Aj_i) | v)f ( v)dv 


f(v)  e  J  . 

oo  (~vA ,  ,) 

/  f(v)  e  J  1  dv 
oJ 

If  a  single  error's  failure  rate  is  modeled  as  distributed 
according  to 

qp  <^i  -pv 


then  the  conditional  distribution  f(v|bug  undiscovered  in  [0,A^  ^]) 
becomes 


fr(cp^  +  V 


and  the  distribution  of  A.  becomes 

J 


with  mean 


fr((N-  j  +  l)-<P,P  +  Aj_x) 


(N  -  j  +  1)9 
"  +  &3-l 

serving  as  the  hazard  function  z(Aj).  Given  the  foregoing  assumptions, 
Littlevood  (1979)  concluded  with  the  unconditional  failure  interarrival 


distribution 


=  /  f(t  iv)  f(x  ;cp,e,A  )  ax. 

J  Q  J  J  J  J  J 


(K  -  j  +  l)?(:  +  A  ^) 


CP(N  -  j  +  1) 


(P  +  A  ) 

J 


[a  -  j  +  i)(«P  +  ij 


The  first  model  presented  above  with 


f(t  |a,p)  =  /  f(t  |X  )  •  f(X  |a,B  )  dX  =  - - SSI 

J  o  J  J  J  J  J  [t.  +  p  ] 

J  J  | 

I 

is  termed  the  Littlewood-VerraU  (L/V)  model, as  it  vas  first  presented  1 

i 

by  these  researchers  in  an  early  joint  paper  (1973)*  Littlevood  (1979)  ■ 
applied  this  model  to  a  set  of  failure  interarrival  time  data  first 
presented  and  analyzed  by  Musa  (1975).  The  author  assumed  a  linear 
reliability  growth  function. 


Pj  =  ao  +  al  '  J 


J and  obtained  the  maximum  likelihood  estimators  a^,  a^  and  a  over  the 
jfirst  few  observations  t^,  t^,  *•*,  t^  and  employed  f  (t^  |a,f^=:aQ  + 

,a.*j)  to  derive  predictions  of  future  failure  rate,  mean  time  to 

i  j- 

J failure,  and  reliability.  Littlevood  (1979)  obtained  an  excellent  fit 
|of  predicted  to  actual  times  t^+^,  •••,  t^  using  this  model. 


A  Model  Based  upon 
Order  Statistics 


I  There  is  every  reason  to  believe  that  the  order  in  which 

errors  are  found  is  nondeterministic.:  current  software  reliability 
theory  models,  however,  uniformly  model  .he  failure  phenomenon  as 


65 


'successive  events,  each  with  a  related  problem  distribution.  At  face 
lvalue,  it  would  seem  an  unjustified  research  decision  to  ignore  the 
probability  of  underlying  events  (errors  "waiting"  in  parallel  time  to 
be  discovered)  and  Jump  directly  to  models  of  the  distributions  of 
ordered  events.  That  no  one  questions  this  common  research  practice 
(see  Goel,  1978!  Schick  &  Wolverton,  1978J  Littlewood,  1979)  is  under¬ 
standable,  for  the  assumptions  of  decreasing  failure  rate  and  negative 
exponential  interarrival  times  is  in  conformance  with  assumptions  of 

•  equal  failure  rates  for  each  constituent  error,  and 

•  negative  exponential  failure  time  distributions 
for  each  error. 

In  short,  previous  investigators  have  been  able  to  ignore  the  order  in 
which  failures  occur  by  modeling  the  associated  errors  as  homogeneous 
Poisson  processes.  If,  however,  either  of  the  assumptions  above  is 
invalid,  no  reasonable  model  of  failure  times  can  be  considered  without 
consideration  of  the  failure  behavior  of  each  individual  error. 

The  assumption  that  underlying  errors  are  not  identically  dis¬ 
tributed  allows  one  to  state  simpler  bases  for  a  failure  model: 

•  If  errors  are  assigned  arbitrary  error  rates  •••,  9^, 
and  hypothesized  to  fail  under  Poisson  assumptions  in 
parallel  time,  then  no  general  or  conmon  models  of  or¬ 
dered  failure  time  distributions  are  possible. 

•  Unless  some  relationship  is  assumed  among  the  probability 
distributions  of  underlying  errors,  no  inference  can  be 
made  about  unobserved  errors. 

•  It  is  counter-intuitive  and  not  in  keeping  with  most  pro¬ 
grammers'  experiences  that  all  errors  are  equally  "hard," 
or  randomly  pursued. 


66 

A  model  is  proposed  that  is  based  only  upon  these  simpler 
sumptions  of  error  independence  and  individuality.  The  specific 
sumptions  are: 

•  Each  error  that  is  inadvertently  introduced  into  a  program 
can  be  assigned  a  "difficulty  rating"  this  difficulty 
rating  reflects  the  combination  of  its  inherent  subtleness 
or  (conversely)  ease  of  discovery,  and  the  relative  6kill 
or  predisposition  of  the  tester  in  pursuit  of  this  kind 
of  error. 

•  These  difficulty  ratings  can  be  considered  random  variables 

identically  distributed  as  f~(A. ),  reflecting  the  imperfect 

A  1 

program  development  skills  of  every  software  practitioner. 

•  Software  failures  (either  improper  program  termination  or 
production  of  incorrect  results)  are  induced  by  the  submis¬ 
sion  of  test  data  and  identified  by  code  and  results  review 
this  procedure  can  be  modeled  as  a  sampling  procedure,  and 
the  time  t^  to  discovery  of  an  arbitrary  error,  is  a 
random  variable;  each  unordered  failure  time  is  subject 
to  the  same  family  of  distributions  f^(t;X  ),  which  differ 
only  in  the  value  of  their  sole  parameter  A^ . 

•  The  resulting  distribution  of  the  failure  time  for  the  ith 
error  is  a  mixture  of  the  form 

f^(x)  =  f~U)  =  / f(x i\)  f(Ai)dA. 

an  identical  distribution  for  each  error,  with  cumulative 
distribution  function  F^(t). 

•  Because  the  most  useful  indexing  of  failures  is  by  their 
order  of  occurrence,  the  random  variable  of  interest  is 
t^j,  the  i**1  order  statistic  "drawn"  from  a  population  of 
n  program  errors,  with  distribution 


as  prescribed  by  results  from  the  theory  or  order  statistics 
(David,  1970). 


The  remainder  of  this  section  shows  how  these  results  are  used  in  the 
development  of  a  new  time-domain  model. 

One  may  assume  that  a  programmer  makes  n  errors,  (c i*€2> • • • ,en) 
over  the  course  of  software  development  and  that  these  are  identifiable 
and  correctable  within  the  body  of  the  program.  With  each  error  e  is 
associated  a  difficulty  index  A^  that  serves  to  represent  the  degree  of 
effort,  insight,  and  luck  necessary  to  provoke  and/or  find  this  error. 

i 

Because  these  errors  are  unintentional,  one  may  assume  that  their  in¬ 
troduction  is  stochastic  and,  further,  that  the  continuum  of  "hard"  to  j 
"easy”  errors  is  reflected  in  the  probability  distribution  for  A,  of  ; 
which  each  A^  is  a  particular  one.  One  may,  for  example,  assume  that  j 
the  set  of  (Ai,Aa> * • • >An)  constitutes  a  sample  from  a  gamma  distributed! 
"difficulty  index  generator"  where 


fA(Va’P) 


A/'1 

rii) 


i 

1 


It  is  normal  to  assume  that  debugging  activity  is  nonspecific — that  is, 
that  each  error  is  the  object  of  a  simultaneous  search  for  all  errors 
(Musa,  1975)*  If  this  is  the  case,  then  each  failure  time  ^  (time  to 
evidencing  of  the  ith  error)  is  equivalently  distributed  and  most 
likely  parameterized  by  its  difficulty  index  A^.  If  A^.  is  interpreted 
as  an  arrival  rate,  and  the  Poisson  postulates  assumed  for  t^,  then  the 
negative  exponential  distribution  describes  the  conditional  distribu¬ 


tion  of  t^  on  A^: 


68 


f(tJV  =  v 


Each  t^  is  an  instance  of  a  random  process  that  generates  failure 
times;  this  process  has  an  unconditional  distribution  of 


f(t)  =  f(t|x)f(X)dX 


.which  for  the  prior  assumed  distribution  is  a  gamma  mixture  of  expo¬ 
nentials  of  the  form 


oo  to 

f(t|o,P)  =q  /  f(t|x)f(X)dX  =J 


T**  e^dX 

- m - 


oP  tw) 

KpT  J  * 


dX 


m 


rO+D 


(t-KX)P+1 


which  will  be  denoted  as  f^lt  |a,P) . 

As  a  program  is  debugged,  the  portion  of  unscrutinized  code 
and  the  subset  of  untested  input  domain  shrink.  Hence  it  may  be  more 
reasonable  to  assume  that  the  instantaneous  "failure  rate"  over  time  is 

z(t)  =  Xt 

as  i6  assumed  in  the  Schick-Wolverton  (1972)  model.  Then  the  distribu¬ 
tion  of  failure  times  should  follow  a  Rayleigh  distribution: 

f(t|x)  «  (Xt)e"Xt't/z  .  Xte'*t8/2  . 

The  resulting  unconditional  distribution  of  t  is  a  gamma  mixture  of 


69 


[Rayleigh  distributed  random  variables  of  the  form: 


phich  vill  be  termed  f^(t|a,P). 

The  arrival  times  of  errors  may  be  indexed  in  order  of  their 
bccurrence,  as 


*(1)  *  t(2)  -  *  t(n)  * 

3ecause  each  of  the  underlying  failure  times  t1}  tg,  •  ••,  tn  is  iden¬ 
tically  distributed,  one  may  draw  on  the  results  of  the  theory  of  order 
statistics.  Immediately  useful  results  include: 


^te-^.Q^e^d*  tC^B 


f~^^(x;n)  =  i(“)F-(x)i_1[l  -  F^x)]11*1  f^(x) 


f~ 


*(1)'  (1) 


(n) 


(X/1  V  *  “  >  x(  n)  )  ■ 


n:  k 


(S5JT ,?!  ft<x(j))[1  - 


n-k 


■/here  f~  and  F~  are  the  density  and  cumulative  distribution  functions 

w  t 

for  the  underlying  distribution,  respectively. 

If  each  error  committed  in  program  development  is  equally  "dif¬ 
ficult,"  and  X^  =  X  for  all  i,  then  each  failure  time  t^  is  identically 
distributed  as 

-Xt 


f~(tjX)  =  Xe 


for  the  Gamma -Exponential  model,  and  for  the  Gamma-Ray  leigh  model: 


t.  *  (x,,x,  •••,  x,,  v  =  ■>—: \ ,  n  - ^ 

*(!)'  t(k)  < L)  <k)  (x2,^  +  a)p+1 


U) 

2 


l1  -  Ft<wJ 


(n-k) 


which  cannot  he  simplified  because  F  (x)  has  no  closed  form  for  arbi- 

u 


traiy  x. 

A  diagram  depicting  the  proposed  model  is  given  in  Figure  6. 

Errors  E^  through  Eg  are  introduced  unknowingly  into  a  program  at  the 

time  of  its  development.  Associated  with  each  error  E^  is  a  difficulty 

index  that  serves  as  the  parameter  of  the  distribution  for  t_^,  the 

jtime  until  E^  is  discovered.  Just  as  the  introduction  of  errors  into 

the  program  is  a  randan  process,  so  is  the  difficulty  of  each  error  so 

introduced.  Hence  the  parameter  \  is  also  modeled  as  a  random  veri¬ 
t¬ 
able.  The  mixture  of  these  distributions  serves  as  the  underlying 

distribution  for  the  order  statistics  t^j.  Each  t^j  is  the  observa¬ 
tion  available  for  statistical  analysis. 


Data  Domain  Models 


Assumptions  and  Teiminolc 


One  may  loosely  describe  data  domain  modeling  as  an  approach 
in  which  the  form  of  testing  is  as  important  as,  or  more  important 
than,  the  duration.  Data  domain  approaches  consider  explicitly  the 
input  domain  of  the  software  and  how  it  can  be  partitioned  into  rele¬ 
vant  classes  better  to  assess  system  reliability.  Nelson  (1973)  and 


The  observed  ordered  failure  times 


Figure  6.  Error  discovery  In  the  proposed  model 


73 


others  have  proposed  partitioning  the  input  domain  D  into  classes  D  , 

for  which  every  member  of  executes  the  same  program  statements  in 

the  same  order.  Furthermore,  D  can  be  dichotomized  into  sets  DC,  for 

which  every  element  yields  a  correct  computation,  and  De,  for  which 

every  element  leads  to  an  improper  program  output.  Let  and  D.  be 

similarly  defined  as  the  correct  and  incorrect  subsets  of  each  path 

partition  D  .  Nelson  (1973)  defined  a  program's  reliability  R  as 
i 


R  =  1 


where  j A  J  denotes  the  cardinality  of  set  A.  In  the  absence  of  the 
knowledge  of  the  precise  distribution  of  the  inputs,  R  can  be  inter¬ 
preted  as  the  probability  that  the  program  will  execute  correctly  on 
any  given  run. 

A  simple  reliability  estimator  can  be  derived  by  successive 
executions  of  a  program.  Lipow  (Thayer  et  al.,  1976)  has  shown  that 
a  sample  of  n  inputs  yielding  ng  errors  yields  an  unbiased  estimator 
of  R  as 

A 

n 

R  =  — 
n 

when  inputs  are  randomly  chosen  from  D  and  Pr(d  €  D)  «l.for  all  d. 

The  data  domain  approach  can  be  better  characterized  by  con¬ 
sidering  how  each  input  dictates  the  computation  of  the  program's  func¬ 
tion  f.  As  outlined  in  previous  sections,  a  program  can  be  represented 

,as  a  collection  of  segments  each  of  which  computes  a  partial  function 

i 

f^.  An  input  data  point  d  forces  the  evaluation  of  the  function 


74 


f  (f  (-..f  (d))-..) 

11  X2  \ 

by  dictating  the  sequence  of  segments  that  will  be  executed.  Whereas 
a  time-domain  approach  treats  only  the  frequency  and  duration  of  pro¬ 
gram  computation,  data  domain  models  consider  the  relationship  of  input 
data  class  to  program  output . 


The  Mathematical  Theory 


lof  Software  Reliability 


Nelson  extended  the  data  domain  reliability  models  to  a  suffi¬ 
ciently  general  body  of  formalism  so  that  the  results  are  termed  the 
Mathematical  theory  of  software  reliability  (MTSR)  (Thayer  et  al., 


1976).  Consider  again  a  program  that  computes  the  function  f:  D  ■*  0 

by  mapping  input  data  to  output  results.  The  input  set  L  of  cardinality! 

0 

[j  is  partitioned  into  a  fault-producing  input  subset  D  and  the  set  of 

c 

data  inputs,  D  ,  which  faithfully  produce  the  outputs  desired;  their 

0  Q 

cardinalities  are  expressed  as  N  and  N  ,  respectively.  The  difference 

between  P  and  a  correct  program  P*  is  the  manner  in  which  f  differs 

{from  the  desired  function  f*  when  applied  to  members  of  le. 

The  MTSR  attempts  to  formalize  these  notions  in  the  following 

way.  Assume  that  during  operational  use  the  program  will  be  employed 

in  a  manner  defined  by  the  distribution  of  inputs.  Assuming  a  finite 

input  data  space  D,  a  mass  function  may  be  defined;  each  point  d  e  D 

[is  assumed  to  be  chosen  with  probability  P  .  This  set  of  P.'s  is 

J  J 

(termed  the  operational  profile.  Letting  the  characteristic  variable  y 

J 

[indicate  the  membership  of  d  in  Rc(y  =0)  or  in  De(y  =l),  then 

J  J  J 


I 


denotes  the  probability  that  an  operational  run  of  P  will  result  in 
execution  failure.  Conversely,  system  reliability  may  be  defined  as 


R  =  1  -  q 


or  the  probability  of  correct  execution  in  the  operational  environment. 
A  discrete  analogue  to  the  time-domain  reliability  function  is 


\  -  d  - »> 


which  represents  the  probability  of  k  successive  correct  runs. 

The  reliability  function  may  be  further  investigated  by  con¬ 
sidering  test  data  distributions  other  than  the  operational  profile 
{Pj}.  Let 


•  23  F, 

J:dj6Dl  J 


F,  -  P,'  ♦ 


where 


F,  0 
c  1 


denote  the  probabilities  associated  with  drawing  sin  arbitrary  (p^),  a 

fault -producing  (p^e),  or  a  "correct"  ( p^c )  data  input  point  from 

partition  .  A  sample  of  n  data  test  points  are  to  be  applied  with 

th 

In  of  the  n  corresponding  to  the  i  partition  D. .  Assuming  that  a 


I  predetermined  sample  size  n  of  test  points  from  partition  are  to  be 


drawn,  the  outcome  of  the  whole  sample  (i.e.,  test  ensemble)  may  be 
(denoted  by  (ej,  the  number  of  test  points  in  the  i*'*1  class  leading 
Ito  execution  failure  of  the  n^  applied.  Hence  any  sequence  of  n^ 
(points  sampled  from  IK  has  an  associated  point  probability  of 


77 


.  Var(e  ) 

VartR]  =  L  - -- 

i€T  n  * 


=  Z 

i€T 


e 

p.  P. 
1  l 


c 


n. 

i 


and  the  mean  square  error  is  given  by 


E[(R  -  R)2]  =  VartR] 


|A  minimum  variance  estimator  R*  may  be  developed  by  appropriate  choice 
of  n^'s.  Thayer  et  al.  (1976)  showed  that  the  minimum  variance  estima¬ 
tor  R*  results  when 


with  minimum  variance 


Var(R*) 


Thayer  et  al.  (1976)  developed  a  number  of  confidence  limit 
formulations  used  to  express  the  likelihood  that  the  true  reliability 

a  a  . 

resides  within  a  specified  interval  (R^;Ry) .  An  approximate  confidence 
interval 

I 

! 

r  ±  VTTt(nj(w)/fe)  ; 

A 

may  be  constructed  about  reliability  estimator  R,  with  variance 


76 


*  A 

estimated  as  V,  assuming  asymptotic  normality  of  R.  A  positive  bias 

A 

in  R  may  lead  one  to  employ  a  smaller  a  than  would  otherwise  have  been 
used. 

Often  one  is  interested  only  in  the  lower  confidence  limit  R^. 
This  can  be  computed  with  a  one-sided  t  formulation  as  above,  or  by 
exact  methods  formulated  by  Neyman  (1937)-  Defining  the  software's 
operational  reliability  as  R^  =  P±C/P±>  the  software's  operational 
reliability  may  be  expressed  as  weighted  average 

R  =  I  R .  P .  =  L  P.C  . 
ill  i  1 

Assuming  a  sample  of  n^  data  sets  is  drawn  from  each  D^,  and  e.  lead 
to  execution  error  an  "exact"  lower  limit  R^a)  can  be  given: 

Va,  -  -  U  -  ^  (£)Vd 

where.  n  *  n^  +  rig  +  *  •  *  +11^  as  defined  above . 

Estimators  of  Program  Error  Content 
A  variety  of  other  approaches  has  been  suggested  for  estimating 
the  number  of  errors  remaining  in  a  software  package. 

The  simplest  of  these  approaches  is  the  handbook  approach  in 
which  error  proportion  statistics  are  maintained  by  program  type  and 
testing  phase.  Moranda  (1975)>  Tor  example,  cited  the  factor  two 
errors  per  hundred  object  instructions  as  a  universal  error  proportion¬ 
ality.  Walston  and  Felix  (1976)  reported  on  the  development  statistics 
aggregated  over  60  programming  projects  of  varying  sizes.  Letting  n^ 


79 


denote  the  number  of  errors  per  thousand  source  lines  purges  during 

validation,  and  n^'  denote  the  number  of  errors  per  thousand  source 

lines  purged  during  the  maintenance  phase,  these  researche  s  reported 

the  25  percent,  50  percent,  and  75  percent  quartiles  for  n'  and  n"  as 

e  e 

(.8,  3-lj  8.0)  and  (.2,  1.4,  2.9),  respectively.  These  and  other  find¬ 
ings  appear  to  invalidate  any  assumption  of  a  universal  error  content 
proport ionality . 

Other  estimators  involving  a  little  more  program- specific  in¬ 
formation  Include  Halstead's  (1977)  error  equation.  The  error  equation 
is  derived  from  assumptions  regarding  human  information  processing 
capability  and  relates  to  other  derivations  that  Halstead  collectively 

V 

terms  software  science.  Halstead's  error  content  estimate  B,  is  given 


B  *  V/3000 


where  V  denotes  program  volume  given  by 


V  =  N  loggn 


for  a  program  of  N  syntactic  elements,  n  of  which  are  unique .  A  number 
of  studies  have  validated  B  as  a  reasonable  estimator  of  error  content 
(Funami  4  Halstead,  1975;  Cornell  &  Halstead;  Love  4  Bowman,  1976). 

Mills  (1970)  proposed  another  error  content  estimation  technique 
that  has  been  further  analyzed  by  Basin  (1973).  Assume  that  of  N  syn¬ 
tactic  units  composing  a  program,  n  are  erroneous  for  seme  reason.  An 

e 

estimate  of  n  c an  be  derived  either  by  seeding  n  new  errors  and  al- 

6  S 


lowing  a  tester  to  discover  errors  n^  of  the  ng  +  ng  present,  or  by 


60 

comparing  the  errors  detected  by  tvo  independent  testers.  Assuming 
that  the  error  samples  are  the  result  of  simple  random  draws,  an  esti¬ 
mator  can  be  derived  from  a  maximum  likelihood  estimator  based  upon 
the  hypergeooetric  distribution.  The  estimator  is  biased  with  both 

the  bias  and  estimator  variance  as  a  function  of  n  .  Further  details 

e 

are  given  in  Basin  (1975)- 


CHAPTER  IV 


EXPERIMENTAL  DESIGN  AUD  PROCEDURES 

Experimental  Objectives 

An  experiment  was  designed  to  collect  the  data  necessary  to 
satisfy  two  experimental  objectives.  The  basic  objective  of  the  expe¬ 
riment  was  the  discovery  of  relationships  between  the  quality  of  test¬ 
ing  and  each  of  the  sources  of  testing  variability  that  affect  it. 

This  was  accomplished  by  an  experimental  design  in  which  the  test-inp; 
performances  of  subjects  were  observed  under  varying  "conditions"  of 
alternative  program  types,  testing  approaches,  and  subject  characteris¬ 
tics.  It  was  a  second  objective  of  the  experiment  to  acquire  experi¬ 
mental  observations  to  infer  whether  sources  of  variability  in  individ¬ 
ual  characteristics,  program  types,  and  testing  methods  affect  the 
forms  of  distributions  related  to  reliability  and  its  growth  over  time. 

Sources  of  Experimental  Variability 
Because  few  testing  experiments  have  been  conducted  for  publi¬ 
cation,  there  existed  an  opportunity  to  collect  new  kinds  of  informa¬ 
tion  from  snail  debugging  exercises  that  could  better  explain  the  rela¬ 
tionships  between  program  complexity,  the  arrival  behavior  of  program 
failures  (manifestation  of  errors),  and  the  controlled  (code  review, 
test  data  selection)  and  stochastic  (error  discovery,  error  correction) 
durations  of  debugging  activities.  Because  of  the  high  cost  and  short 

81 


duration  of  the  proposed  experiment,  it  was  necessary  to  consider  at 
length  the  dependence  between  the  observational  variable  chosen  and 
the  mechanical  considerations  necessary  to  insure  that  these  data  were 
collected  accurately.  The  experimental  activities  of  the  testing  sub¬ 
jects  had  to  be  representative  of  "real"  software  testing,  and  yet  be 
sufficiently  structured  that  subjects  could  participate  in  data  record 
ling  without  undue  annoyance. 

The  experiment  was  designed  around  three  "objects"  varied  to 
jallow  inferences  to  be  made  regarding  the  nature  of  testing.  (See 
jFigure  J.)  The  experimental  objects  include: 

I  .  The  Tester-subjects:  a  group  of  22  professional  programmers 
chosen  to  represent  a  universe  of  conventional  software  de¬ 
velopers  of  small-  to  medium-scale  software. 

.  The  Experimental  Programs:  a  set  of  four  small  (60-JOO 
lines)  programs  chosen  to  represent  a  universe  of  conven¬ 
tional  applications  over  several  problem  domains. 

•  The  Testing  Methods:  two  approaches — "white-box"  and  "black¬ 
box" — chosen  to  represent  one  testing  approach  for  each  of 
two  extreme  points  of  view. 

The  details  regarding  these  experimental  factors  are  given  below. 


Experimental  Subjects  and 
Individual  Differences 

In  his  analysis  of  a  large  verification  experiment,  Eetzel 
(1975)  found  that  computing  experience/education  and  6elf-confidence 
were  significantly  related  to  better  testing  performance,  with  the 
"best"  of  39  subjects  performing  two  to  three  times  as  productively  as 
the  "worst."  In  deference  to  Hetzel's  findings,  it  was  assumed  that 


Figure  7-  A  pictorial  overview  of  the  experiment 


the  differences  in  individuals  that  affect  testing  performance  could 
I  be  estimated  by  obtaining  values  for  personal  attributes. 

Each  of  22  experimental  subjects  was  interviewed  through  a 
questionnaire  to  obtain  biographical  data  that  might  have  a  bearing 
on  subject  performance.  The  data  requested  from  each  subject  included 

•  Subject  Name 

.  Subject  Age 

.  Subject  Sex 

.  General  Educational  Background  and  Degrees  Earned 

•  Educational  Background  in  Computer-Belated  Studies 

.  Educational  Programming  Experience 

•  Educational  Background  in  Programming  Theory  and  Practice 

t  Professional  Background  in  the  Information  Sciences 

•  Professional  Programming  Experience 

.  Composition  of  Programming  Experience  by  Programming  Language 

•  Composition  of  Programming  Experience  by  System  Environment 
( batch/ interactive ) 

•  Subject  Estimation  of  Proficiency  in  Areas  Useful  to 
Programmers 

.  Subject  Indication  of  Familiarity  with  Theory  and  Practice 
Regarding  Each  Experimental  Program 

Subjects  were  asked  to  include  in  their  educational  background  any 
high  school  or  college  coursework,  as  well  as  training  schools, 
company-sponsored  educational  programs,  and  extended  professional 
seminars.  Professional  experience  included  full-  and  part-time  employ 
ment  as  well  as  independent  development  of  skills.  A  self-assessment 
of  proficiency  was  asked  of  each  user  in  categories  ranging  from  aca¬ 
demic  fields  of  study  to  computing  skills;  responses  were  chosen  from 
a  five-point  ordinal  scale. 


In  estimating  his  proficiency  in  the  theory  and  implementation 
of  each  of  the  four  problem  classes  represented  by  the  experimental 
programs,  a  subject  could  choose  among  five  ordered  levels  of  familiar¬ 
ity  with  the  problem.  Copies  of  the  questionnaires  used  are  included 
in  Appendix  A. 


Experimental  Programs  and 
Their  Characteristics 

It  is  a  natural  and  common  belief  among  computing  practitioners 
and  researchers  that  some  programs  are  more  difficult  to  program  and 
debug  than  others.  Thayer  et  al.  (1976)  and  others  indicated  that  the 
propensity  to  err  in  programming  is  related  to  the  "size"  of  the  task 
being  attempted  and  the  number  of  paths  of  execution  possible  within 
an  implementation  (i.e. ,  program  performing  the  task).  One  may  denote 
"size"  as  computational  content,  the  degree  of  arithmetic  and  symbolic 
processing  represented  by  a  program.  Logical  complexity  is  the  usual 
nomenclature  used  to  denote  the  degree  to  which  a  program  departs 
from  simple,  sequential,  unconditional  execution. 

It  was  hypothesized  that  the  degree  of  computational  content 
and  logical  complexity  associated  with  a  program  would  materially  af¬ 
fect  the  performance  of  experimental  subjects  attempting  to  find  pro¬ 
gram  errors.  Hence  an  integral  part  of  the  experimental  design  was 
the  choice  of  four  programs,  representing  each  combination  of  high  and 
low  degrees  of  computational  content  and  logical  complexity.  The  logi¬ 
cal  complexity  of  a  program  was  measured  by  McCabe's  (1976)  complexity 
measure  and  by  TEW  Systems'  complexity  metric  (Thayer  et  al.,  1976). 

The  degree  of  a  program's  computational  content  was  assessed  by  a 


66 


method  inspired  by  Halstead's  (1977)  metrics,  which  are  defined  in 
terms  of  the  number  of  program  operators  and  operands.  The  formulae 
for  these  metrics  and  calculations  for  each  experimental  program  are 
presented  in  Appendix  B. 

So  that  this  investigator  would  have  a  feeling  for  the  range 

i 

! 

and  sensitivity  of  the  metrics,  these  measures  of  logical  complexity  j 

I 

I 

and  computational  content  were  computed  for  a  sample  of  programs  chosen  j 

i 

from  an  available  program  library.  In  this  way,  upper  and  lower  bounds 
were  defined  for  low  and  high  settings,  respectively,  of  each  metric. 
Details  are  given  in  Appendix  B. 

All  other  decisions  regarding  the  form  and  substance  of  the 
experimental  programs  were  made  in  favor  of  simplicity  and  representa¬ 
tiveness.  For  reasons  of  convenience,  accessibility,  and  ease  of 
experimental  administration,  the  Keck  Management  Science  Center  of  the 
University  of  Southern  California  School  of  Business  was  chosen  as  the 
experimental  site.  The  Center's  resident  time -shared  minicomputer  sys¬ 
tem  (Hewlett-Packard  2000  Access  System)  was  selected  as  the  experi¬ 
mental  computing  resource.  All  programs  were  written  employing  a  small 
subset  of  the  BASIC  language  that  most  resembles  other  BASIC  implemen¬ 
tations  as  well  as  semantically  similar  statements  in  COBOL,  FORTRAN, 
FL/l,  and  ALGOL. 

The  precise  choice  of  programs  to  be  used  in  the  experiment 
was  a  time-consuming,  iterative  procedure.  To  ensure  that  the  results 
of  this  research  would  be  generalizable  to  a  large  population  of  pro¬ 
gram  types,  a  large  set  of  diverse  programming  problems  was  considered,  : 
from  which  four  programs  were  chosen  for  use  in  the  experiment.  Each 


e- 

program  candidate,  in  turn,  was  formally  specified  and  coded  in  BASIC. 
The  metrics  were  computed  for  each  successively  coded  program.  If  a 
program  produced  unamt iguously  high-  or  low-valued  measures  for  each 
metric,  it  was  included  as  a  candiate  for  one  of  the  four  experimental 
programs. 

The  experimental  programs  chosen  were  deemed  as  mutually  di¬ 
verse  as  possible.  They  are  described  below  and  in  Appendix  C. 

•  ITAX:  a  125-statement  program  that  computes  state  and 
federal  income  tax  in  conformity  to  a  set  of  prescribed 
rules  for  income  and  deduction  admissibility.  ITAX  is 
low  in  both  computational  content  and  logical  complexity. 

.  OPTM:  a  70-statement  program  that  solves  for  the  local 
optima  of  a  fifth-degree  polynomial.  OPTM  is  hiffti  in 
computational  content  and  low  in  logical  complexity. 

•  SCNR:  a  260-statement  program  that  deteimines  and  includes 
the  symbols  of  a  fictitious  programming  language.  SCNB  is 
low  in  computational  content  and  high  in  logical  complexity. 

«  LNPR:  a  205-statement  program  that  solves  linear  program¬ 
ming  problems  by  either  primal  or  dual  simplex  algorithms. 

LNER  is  high  in  both  computational  content  ana  logical 
complexity. 

Test  Data  Types 

Of  the  test  data  generation  methods  discussed  in  Chapter  I, 


two  were  chosen  to  determine  -whether  any  differences  in  debugging  per- 
iformance  could  be  attributed  to  differences  in  the  "program  stimuli" 
(program  output  on  failure)  generated  by  alternative  test  set  methodol 


88 


"White-box"  test  data  were  generated  by  following  the  proce¬ 
dures  outlined  by  Meyers  in  The  Art  of  Software  Testing  (1979).  A 
program  control  graph  was  constructed  and  input  data  points  were  de¬ 
vised  to  ensure  that  all  of  the  major  program  execution  paths  were 
taken.  A  "black-box"  test  data  set  was  also  constructed  in  conformance 
with  Meyers:  test  data  points  were  included  that  verified  conformance 
to  explicit  rules  of  program  behavior  indicated  by  the  program's 
specification.  Specific  components  of  each  test  data  set  generation 
category  (black-box  and  white-box)  are  listed  below.  Descriptions  of 
each  testing  approach  were  given  in  Chapter  I. 


White -Box 
Path-testing 
Branch-covering 
Structural  testing 


Black-Box 

Specification-based  testing 
Special  values  testing 
Functional  testing 


A  description  of  the  steps  taken  to  develop  white-box  and  black-box 
test  data  sets  is  detailed  in  Appendix  D. 


Program  Errors 

In  attempting  to  discover  the  factors  affecting  program  debug¬ 
ging,  this  and  all  prior  experiments  (Bubey  et  al.,  1975;  Howden, 

1978;  Hetzel,  1975)  found  it  necessary  to  use  real  programs  containing 
"naturally  occurring"  errors.  In  an  ideal  experiment  each  subject 
would  develop  a  program  from  a  common  specification  and  be  observeu 
during  his  or  her  pursuit  of  errors.  However,  the  cost  involved  in 
conducting  even  a  moderate-sized  experiment  of  this  kind  would  Ik. 


AD- A110  366  UNIVERSITY  OF  SOUTHERN  CALIFORNIA  LOS  AN6ELES  DEPT  0 — ETC  F/G  9/2 

A  STUDY  OF  FACTORS  AFFECTING  SOFTWARE  TESTING  PERFORMANCE  AND  C~ETC(U> 
JUN  00  J  L  BAHR 


UNCLASSIFIED 


89 

prohibitive  unless  the  resulting  programs  were  of  seme  commercial  use. 
It  is  precisely  this  high  cost  of  experimentation  that  motivated  Boehm 
(1973)  to  implore  the  computing  community  to  record  data  on  programming 
projects  so  that  the  results  of  comparable  development  activities  could 
be  analysed  in  a  post-facto  "experiment." 

To  insure  a  comparability  of  stimuli  to  each  experimental  sub¬ 
ject,  it  was  necessary  that  each  debugging  participant  face  not  only 
the  same  programs,  but  the  same  population  of  unidentified  errors. 

After  selecting  the  set  of  experimental  programs,  this  investigator 
lebugged  each  program  and  subjected  each  program  to  a  collection  of 
rfhite-box,  black-box,  and  ad  hoc  test  data  sets.  Each  error  found  was 
aumbered  and  categorized  as  computational,  logic/ sequencing,  or  data 
landllng. 

These  error  categories  were  described  in  Chapter  XI;  a  listing 
of  the  specific  errors  found  is  given  in  Appendix  E. 

Experimental  Design 

Design  Objectives 

Each  decision  concerning  the  experiment  was  influenced  by  one 
of  two  research  motivations.  The  primary  influence  on  the  structure 
of  the  experiment  was  the  necessity  that  analysis  be  able  to  disambigu¬ 
ate  the  effects  of  various  factors  on  observed  performance.  For  this 
reason  a  fractional  factorial  experimental  design  dictated  the  assign- 
nent  of  programs  and  test  data  to  subjects.  Although  the  effects  of 
Individual  differences  could  not  be  controlled,  it  was  hoped  that  the 
lifferences  among  the  performances  of  subjects  under  similar  conditions 


90 


could  be  partially  explained  by  the  (measured)  differences  in  back¬ 
ground  and  personality.  Same  possible  influences  were  controlled:  j 

subjects  faced  identical  programs  on  identical  days  of  the  week  for  I 

i 

the  same  duration  using  an  identical  system.  1 

The  assignment  of  subjects  to  levels  of  experimental  variabil-  ; 

.  ! 

ity  is  shown  in  Table  4.  The  table  shows,  for  each  subject,  the  group 

i 

of  which  each  subject  was  a  member,  the  order  in  which  the  subject  was  ; 
required  to  debug  the  programs,  and  the  type  of  test  data  sets  pro¬ 
vided  for  each  experimental  program.  Debugging  order  is  indicated  by 
a  permutation  of  the  experimental  programs'  initials  (l[TAX],  L[NFR], 
0[PTM],  S[CNE]),  and  test  data  type  is  designated  as  black-box  (BB)  or 
white-box  (WB) .  The  experiment  produced  88  outcomes,  resulting  from  11 
replications  of  the  2x2x2  fractional,  factorial  design. 

The  second  general  research  motivation  was  the  desire  to  pro¬ 
vide  better  answers  to  questions  about  important  activities  in  software 
development  and  to  test  sane  of  the  assumptions  prevalent  among  soft¬ 
ware  researchers  and  practitioners.  Some  of  these  questions  were: 

•  Do  individual  differences  matter — age,  sex,  education,  or 
experience  with  similar  problems  or  similar  programming 
environments? 

•  Do  the  test  data  make  a  difference  in  the  rate  or  nature  of 
errors  found,  or  become  important  merely  when  verifying  pro¬ 
gram  correctness? 

•  Are  some  programs  inherently  more  difficult  to  comprehend? 

Are  those  differences  quantifiable?  i 

i 

•  Are  some  types  of  errors  more  difficult  to  detect?  Is  de¬ 
bugging  a  random  sampling  of  remaining  errors?  I 


92 


The  experiment  vas  designed  so  that  at  least  some  of  these  questions 
could  be  formulated  in  terms  of  statistical  tests.  It  was  hoped  that 
for  the  remaining  questions,  the  results  of  this  study  would  at  least 
provide  the  basis  for  an  informed  opinion. 

Notation  for  Factors 
and  Observations 

The  experiment  was  designed  to  determine  which  personal  and 
environmental  factors  affect  the  detection  and  correction  of  program 
errors.  The  recognition  of  erroneous  program  statements  was  considered 
a  significant  event,  and  data  collection  forms  were  designed  for  re¬ 
cording  the  specific  errors  found  and  the  times  of  their  discovery. 

A  variety  of  influences  was  expected  to  affect  error  discovery 

times: 

•  The  programs  in  which  the  errors  were  found 

.  The  test  data  set  types  that  subjects  were  permitted  to  use 

•  The  backgrounds  and  personality  traits  of  the  subject. 

The  program  characteristics  that  were  expected  to  influence  performance 
were  identified  as  the  degree  of  logical  complexity  and  computational 
content .  Each  experimental  program  was  chosen  because  it  represented 
one  of  four  combinations  of  two  factors  at  two  levels.  Moreover,  for 
each  program,  each  subject  was  instructed  to  use  one  of  two  test  data 
sets — white -box  or  black-box.  Hence  each  observation  of  a  detected 
error  occurred  in  the  context  of  one  of  eight  experimental  states. 

Each  subject  was  given  a  different  ordering  in  which  to  debug  the 
experimental  programs,  in  hopes  of  controlling  any  learning  or  degrada¬ 
tion  effect  upon  performance.  Two  basic  types  of  performance  measures 


93 


were  determined  for  each  experimental  quantum  involving  a  subject  and 
a  program:  the  number  and  kinds  of  errors  found  and  the  time  distri¬ 
bution  of  these  events. 

Let  Ejk1m  denote  the  number  of  errors  found  by  subject  m,  where 


0  implies  that  the  errors  were  found  in  a  program 
of  low  logical  complexity  (ITAX  or  OPTM) 

1  implies  that  the  errors  were  found  in  a  program 
of  high  logical  complexity  (SCNR  or  LNPR) 


0  implies  that  the  errors  were  found  in  a  program 
of  low  computational  content  (ITAX  or  SCNR) 


1  implies  that  the  errors  were  found  in  a  program 
of  high  computational  content  (OPTM  or  LNPR) 


•{: 


implies  white-box  test  data  were  employed 
implies  black-box  test  data  were  employed 


furthermore,  denote  error  counts  mnemonically,  as: 

E»"  *EOOOm  <IIAX’  ''“ite-box), 
C  *  E001m  (ITAX'  black‘box)' 

C  *  E010m  <0PIM' 

P  **  *  E011m  black-box), 

C  SE100m  (SCNB^  white -box), 

S  ^  *  E  (SCNR,  black-box), 
m  101m 

LffiW  s  E110m  (LNPR,  white-box), 


L  £  Eillm  black-box). 


For  analyzing  error-detection  times  and  interarrival  intervals,  let 
tjklm  denote  the  time  to  the  discovery  of  error  number  i  for  subject 
m,  vhere  J,  k,  and  1  indicate  factor  settings  as  described  above.  The 
superscript  indexes  the  errors  in  an  arbitrary  order.  Let  t^^  denote 
the  time  to  the  i^*1  error  detected  and  represent  an  order  statistic. 

The  number  of  errors  known  to  be  present  in  each  program  is  fixed  and 


denoted  as: 


1*01  =  Nj  =  the  number  of  errors  known  to  be  present 
in  ITAX, 

=  Np  s  the  number  of  errors  known  to  be  present 
in  OPTM, 

N10  =  Ng  £  the  number  of  errors  known  to  be  present 
in  SCNR, 

Kpp  =  £  the  number  of  errors  known  to  be  present 

in  LNPR. 


All  random  variables  will  be  denoted  by  the  use  of  tildes; 
hence,  where  x  may  denote  an  observation  prior  to  sampling,  x  repre¬ 
sents  the  sample  outcome.  Sums  of  indexed  variables  will  be  denoted 
by  the  replacement  of  the  index  of  summation  by  an  asterisk.  Hence, 


xj**« '  o*L  Xjid“ 


x„.  „  =  Z  Z  X.,  - 
^  osjsi  osisi  Jklm 


Means  will  be  denoted  similarly  with  the  addition  of  a  bar.  Hence, 


X 


95 


and 


-  y  JifeS 

^  rci  2 

-  _  y  y  Xjklm 

J  oskil  0££§1  4 


Experimental  Procedures 

Experimental  subjects  were  recruited  frcm  industrial  and  aca¬ 
demic  environments  in  the  Los  Angeles  area.  To  be  considered  as  a 
prospective  experimental  subject,  each  candidate  was  interviewed  to 
determine  if  he  would  be  representative  of  the  population  of  program¬ 
ming  professionals.  The  criteria  for  subject  selection  were  that: 

.  The  subject  had  been  employed  in  a  programming  capacity 
for  at  least  12  months  in  the  previous  five  years. 

•  The  subject  was  currently  employed  in  a  computer-related 
field  that  required  at  least  occasional  programming. 

•  The  subject  had  had  at  least  a  minimal  exposure  to  the 
BASIC  programming  language,  and  substantial  experience 
with  a  ccmmon  procedural  language  (e.g.,  FORTRAN,  COBOL, 
or  ALGOL) . 

.  The  subject  had  had  some  experience  with  interactive, 
terminal -oriented  programming  systems. 

Twenty-two  subjects  were  engaged  for  the  experiment  and  assigned 
one  of  three  dates  for  participation  in  the  experiment.  Each  subject 
was  given  a  packet  containing:  ! 

.  An  introductory  letter  outlining  the  purpose  and  scope 
of  the  experiment. 


A  questionnaire  to  determine  educational,  biographical, 
and  professional  backgrounds. 


96 


•  A  set  of  four  specifications  describing  the  purpose,  inputs, 
outputs,  and  procedural  approach  for  each  of  the  experimen¬ 
tal  programs  ITAX,  LNPR,  OFTM,  and  SC NR. 

•  Activity  logs  and  program  modification  logs  upon  which  to 
record  progress  during  the  debugging  of  each  program. 

•  An  experimental  timetable  and  map  to  the  experimental  site. 

•  A  copy  of  the  rules  for  Keck  Center  and  the  procedures  neces¬ 
sary  for  interacting  with  the  Hewlett-Packard  2000  system. 

•  A  synopsis  of  the  Hewlett-Packard  system  commands  and  the 
BASIC  language  statements  supported. 

Copies  of  these  documents  can  be  found  in  Appendices  A,  C,  and  F. 

The  set  of  subjects  was  split  into  three  approximately  equal 
groups;  each  group  met  on  a  different  Sunday,  during  which  the  comput¬ 
ing  center  was  closed  to  all  others.  Each  session  was  identical  in 
format,  starting  at  8  a.m.  and  finishing  at  8:30  p.m.  with  scheduled 
breaks  for  orientation  end  meals. 

During  the  orientation  session  the  subjects  received  a  review 
of  useful  system  commands  and  anomalies  of  Hewlett-Packard  BASIC  that 
had  unavoidably  been  introduced  within  the  experimental  programs.  All 
6ubject6  were  instructed  on  the  use  of  the  data  collection  forms  and 
given  the  time  durations  allowed  for  each  experimental  program.  Each 
subject  was  told: 

.  the  system  account  number  and  password  specif ically 
assigned  to  him  or  her, 

•  the  order  in  which  the  experimental  programs  were  to 
be  debugged,  and 

.  for  each  program,  whether  white-box  or  black-box  test 
data  were  to  be  employed. 


[in  each  subject's  program  library  were  the  programs  and  data  files 
[necessary  to  conduct  the  experiment.  These  included: 


.  The  four  completely  debugged  experimental  programs  named 
ITAX,  LNPR,  OPTM,  and  SCNE;  subjects  were  able  to  run 
these  programs,  but  were  prevented  by  the  system  from 
listing  them. 

.  The  four  undebugged  experimental  programs  named  ITAX00, 

LNPR00,  OFTM00,  and  SCNR00;  these  programs  were  constructed 
by  reinserting  the  original  errors  within  ITAX,  LMPR,  OPTM, 
and  SCNR;  subjects  were  permitted  to  modify,  run,  or  list 
these  versions. 

•  Eight  sets  of  test  data  sets — a  "white-box"  and  "black-box" 
set — for  each  of  the  four  programs. 

Subjects  were  asked  to  keep  track  of  the  time  spent  on  each 
program  and  required  to  record  on  activity  logs  the  start  and  stop 
times  of  each  debugging  activity.  Activities  included: 

•  Code  Review/Error  Detection:  any  activity  in  which  errors 
are  being  sought  by  reading  the  program  listing  and  compar¬ 
ing  it  to  program  specifications. 

•  Error  Correction:  any  significant  time  duration  needed  to 
formulate  tne  "fix"  required  to  eliminate  an  identifiable 
program  error. 

•  Terminal  Work:  any  clerical  activity  involving  use  of  the 
computing  system — making  program  changes,  retrieving  and 
saving  program  versions,  obtaining  fresh  program  listings. 

.  Test  Data  Set  Development:  developing  test  data  to  exer¬ 
cise  the  experimental  programs;  subjects  were  not  permitted 
to  develop  their  own  test  data  unless  they  had  debugged  a 
program  well  enough  that  it  ran,  without  error,  all  pre¬ 
scribed  white-box  or  black-box  test  sets. 


Break:  any  nontrivial  mental  or  physical  departure  from 
one  of  the  above  activities. 


98 


Upon  detecting  an  error,  a  subject  was  requested  to  post  the 
statement  number(s)  affecting  the  error,  the  time  of  discovery,  the 
corrective  action,  and  any  comments  that  may  have  been  of  interest  to 
this  investigator.  This  investigator  and  a  proctor  made  periodic 
checks  on  the  progress  of  each  subject  and  reviewed  the  data  collection 
sheets  to  insure  that  activity  was  being  properly  documented.  This 
investigator  was  available  during  each  session  to  explain  perceived 
ambiguities  in  the  specifications,  aid  those  who  had  forgotten  system 
commands  and  procedures,  and  otherwise  minimize  the  time  that  subjects 
needed  to  spend  in  nondebugging  activities. 


i 


f 


CHAPTER  V 

DATA  ANALYSIS 

Basic  Results  and  Descriptive  Statistics 
The  22  subjects  constituted  a  diverse  set  of  software  profes¬ 
sionals.  Ten  subjects  worked  full  time  for  a  company  whose  product 
was  not  computer-related,  and  7  subjects  were  regularly  employed  to 
produce  software  systems  for  resale.  The  remaining  5  subjects  were  i 

academicians  currently  teaching  computer  studies,  and  who  had  had  sig-  j 

I 

nif leant  programming  experience.  Ages  of  the  subjects  ranged  from  19 


to  53.  Median  age  was  28  and  mean  age  was  approximately  31*  There 

were  5  female  subjects  of  the  22.  A  histogram  of  the  subjects’  ages 

is  shown  below  in  Figure  8. 

X 

X 

XX  X 

XXXXXXXX  XXX  XXX  X  X 

X 

L8  20  22  2b  26  28  30  32  34  36  38  *«0  1*2  1*4  46  48  50 

52  54 

Figure  8.  Distribution  of  subjects'  ages 

99 


100 


Educational  attainment  was  measured  by  five  increasingly 
specific  variables: 

•  Highest  degree  earned,  an  ordinally  scaled  variable  with 

permissible  values  (0)  none,  (l)  high  school,  (2)  bache¬ 
lor's,  (3)  master's,  (4)  all -but-dissertation,  or  (5)  ! 

doctorate . 

.  Full-time  years  of  schooling,  which  could  include  train¬ 
ing  courses.  | 

•  Courses  in  computer-related  subjects,  measured  in  "stan¬ 
dard  units";  a  quarter-unit  was  assigned  2  standard  units 
and  a  semester-unit  was  assigned  3  standard  units. 

.  Courses  requiring  a  significant  degree  of  programming, 
measured  in  standard  units. 

.  Courses  in  programming  theory  and  practice,  measured  in 
standard  units. 

The  subject  set  represented  the  full  spectrum  of  educational  background. 
Subjects'  years  of  schooling  ranged  frcm  9  to  26  years,  and  all  degree 
levels  were  represented.  The  diversity  of  training  in  computer  studies 
was  similarly  wide;  three  of  the  subjects  had  virtually  no  formal  train-1 
ing  in  the  field,  while  three  had  over  30-semester  unit6  of  computer 
studies.  Educational  background  in  programming  theory  and  practice  was  | 
particularly  dichotomous,  with  7  subjects  having  taken  no  courses,  ! 
whereas  11  subjects  had  taken  at  least  4  courses.  Histograms  of  sub¬ 
jects'  educational  background  are  shown  in  Figure  9*  j 

The  first  histogram  indicates  the  number  of  subjects  within  j 
(each  of  the  degree  attainment  categories  (next  to  each  "X"  signifying  ] 

f  • 

la  subject,  is  the  number  of  years  of  schooling  obtained  by  that  sub¬ 
ject).  The  three  remaining  frequency  distributions  show  where  each 


i  3 


of  the  22  subjects  reside  on  educational  scales  measured  by  the  number 
of  standard  unit6  semester  unit)  taken  in  different  subject  cate¬ 


gories. 


In  Figure  10,  the  sample  distribution  of  subjects'  professional 


ejqperience  is  displayed.  Any  endeavor  in  the  computing  field  was  in¬ 
cluded  as  general  experience-programming,  systems  analysis,  project 
gement,  or  teaching.  Consistent  with  the  vide  disparity  among 
subject  ages,  the  range  of  general  professional  experience  was  simi- 
arly  broad — from  less  than  10  man-months  to  over  160.  The  median  was 
pproximately  55  man-months,  indicating  a  mild  skewing  to  the  right. 
Professional  experience  in  programming  was  significantly  more  skewed, 
th  the  bulk  of  the  subjects  having  worked  less  than  Uo  man-months 
programming  activities.  The  left-shifted  median  of  approximately 
20  man-months  reflects  the  tendency  of  subjects  to  shift  to  analysis 
and  management  roles  with  experience,  and  as  the  non-programming 
experience  of  the  academicians  in  the  sample. 


VyO  IcO  :;o 


Maa-osathj  of  ;rof*i*lcaaI  *xp«ri«nc§— pro*rMKiag 


Figure  10.  Sample  distributions  of  subjects'  professional  experience 


The  filial  question  on  the  experimental  questionnaire  asked  the 
subjects  to  assess  their  knowledge  and  proficiency  in  12  areas.  This 
question  was  motivated  by  Hetzel'6  findings  in  a  related  study  (1975)  l 
that  confidence  (as  measured  through  self-assessment)  correlated  well  j 
with  testing  performance.  Hetzel  asked  a  relatively  homogeneous 

i 

group  to  rate  themselves  with  respect  to  the  other  experimental  sub-  j 
jects.  In  this  study,  the  subjects  were  (non-specifically)  asked  to  | 
rate  themselves.  It  was  assumed  that  this  nonspecificity  would  induce  j 
the  subjects  to  rate  themselves  against  their  perception  of  the  general 
knowledge  and  proficiency  levels  of  the  population  of  computing  profes¬ 


sionals. 


The  group  of  subjects  responded,  as  a  whole,  with  a  reasonably 


high  self-assessment.  The  most  likelj  response  for  every  category  but 
one  (operations  research)  was  AVERAGE  or  STRONG,  and  the  response  medi¬ 
ans  were  closer  to  STRONG.  Questions  regarding  subject  proficiency  in 
programming  tasks  and  knowledge  of  data  processing  principles  were 
responded  to  with  the  highest  6elf-assessment.  Other  programming- 
related  skills  were  generally  responded  to  with  positive  self- 
assessment  as  was  proficiency  in  mathematics.  The  two  application 

I 

areas  (probability/statistics  and  operations  research)  resulted  in  the  j 

1 

lowest  self-assessment  responses.  | 

Table  5  shows  a  breakdown  of  responses  by  proficiency  category.' 

I 

A  measure  was  constructed  by  a  linear  weighting  of  each  response  on  a  j 
scale  of  1  to  5,  resulting  in  a  weighted  self-assessment  score  for  each 
subject;  a  similar  measure  of  overall  sample  self-assessment  was  also 
calculated  for  each  category.  The  resulting  distribution  of  category 


105 


Table  5 

Breakdown  of  Self -Assessment  Ratings  by  Category 


Ratings 


Category 

Score 


Bata  processing 

0 

1 

7 

11 

2 

Computer  science 

2 

1 

8 

9 

1 

Systems  analysis 

0 

2 

e 

10 

1 

Systems  design 

1 

4 

5 

11 

0 

Program  design 

0 

0 

5 

13 

3 

Program  writing 

0 

1 

7 

9 

4 

Program  debugging 

0 

1 

8 

7 

5 

Operations  research 

4 

6 

5 

4 

2 

Probability/ statistics 

4 

4 

7 

5 

1 

Mathematics 

0 

3 

9 

6 

3 

File  handling 

0 

2 

9 

6 

2 

Algorithms 

1 

4 

8 

5 

3 

Rating 

Frequency 

12 

29 

86 

98 

27 

|  £ 

v  ...  :■■■'  .■  •  ...  ,  ^  Y  j 

(a)  Distribution  of  category  scores 

X 

x  XX 

X  X  X  X  x  X 

X  x  XXXXXXXX  X 

(b)  Distribution  of  subjects'  weighted  self-assessment  scores 

! 

:  Figi 

ire  12.  An  analysis  of  self-assessment 

JI'H 


/niwpiyaf 


106 


scores  Is  shown  in  Figure  12(a).  The  sample  distribution  of  subject 
self -assessment  scores  is  shown  in  Figure  12(b).  The  subjects  showed 
little  variability  in  their  overall  self -assessment,  as  is  demonstrated 

i 

by  the  clustering  of  most  subject  scores  in  the  interval  fran  35  to  45  1 

i 

on  a  scale  from  12  (all  very  weak  responses)  to  60  (all  very  strong  j 
responses).  The  vast  differences  in  the  experience  levels  and  educa¬ 
tional  backgrounds  of  the  subjects  makes  this  relative  uniformity  in 
self-assessment  somewhat  surprising.  It  is  conjectured  that  optimism 
end  confidence  may  have  as  much  influence  on  a  subject’s  self- 
assessment  as  does  the  actual  proficiency  level  developed  by  the  sub¬ 
ject.  In  the  academic  areas  (mathematics,  computer  science, 
probability/statistics,  operations  research,  and  algorithms),  it  is 
conjectured  that  education  makes  one  aware  of  are's  Ignorance  of  the 
area  at  a  rate  approximating  one’s  mastery  of  the  topic.  In  support 
of  this  conjecture,  it  was  found  that  some  subjects  with  little  formal 
training  ranked  themselves  equal  to  subjects  with  graduate  work  in  an 
area. 

I 

After  the  subjects  had  studied  the  program  specifications  each 
was  asked  to  specify  his  or  her  degree  of  familiarity  with  the  princi¬ 
ples  underlying  the  application  dealt  with  by  each  program  (theory),  i 

and  to  indicate  the  extent  of  experience  with  programming  similar  ap-  J 

plications  (practice).  A  summary  of  the  responses  is  given  in  Table  6.  j 

! 

Subjects  generally  expressed  a  greater  familiarity  with  the  theory  than  | 
(with  the  techniques  of  Implementing  these  programs.  As  to  be  expected,  j 
Subjects  were  most  familiar  with  the  theory  and  practice  of  programs 


107 

similar  to  ITAX,  an  income  tax  calculator.  OPTM,  the  most  specialized 
sf  the  problems  received  the  most  negative  responses,  indicating  lack 
Df  exposure  to  the  problem.  SCNR  and  LNPR  dichotomized  the  subject 
sample  somewhat.  Those  with  computer  science  backgrounds  or  work  ex¬ 
perience  in  certain  system  programming  areas  were  certain  to  have  seen 
applications  like  SCNR,  whereas  all  others  most  likely  would  not  have. 
Che  theory  behind  LNPR  (a  linear  programming  code)  had  most  likely  been 
exposed  to  those  with  a  business  school  educational  background .  This 
proved  to  be  so.  Judging  from  many  subjects'  expressed  familiarity  with 
the  theory.  Most  of  these  same  individuals,  however,  would  not  gener¬ 
ally  be  engaged  in  scientific  programming  work,  hence  the  very  low 
Levels  of  experience  in  programming  applications  similar  to  LNPR. 


Table  6 

Breakdown  of  Familiarity  Ratings 


Not 

Familiar 

Not 

Very 

Familiar 

Familiar 

Very 

Familiar 

ITAX-Theory 

1 

3 

2 

12 

4 

ITAX -Practice 

4 

4 

4 

7 

3 

LNPR-Theory 

5 

4 

7 

6 

2 

LNPR-Practice 

9 

7 

4 

0 

7 

OPTM-Theory 

4 

8 

4 

6 

0 

OFTM-Practice 

11 

7 

0 

3 

1 

SCNR-Theory 

4 

3 

4 

7 

4 

SCNR-Practice 

8 

5 

2 

4 

3 

Totals 

44 

4l 

2? 

45 

18 

Tibia  7 

Bynpo.l.  of  t)u*(tlanmlre  Variable. 


109 


t 

4 

a 

i 

i 

s 

UN 

ON 

e-  on  co  h 

"'RRJ  1 

UNCO 
•  • 

JR 

UN  UN  O 

oj  *Ncg 

5  SnS  S  d £3  “&8  »oir\ 

h- O  f—  Cavo  ad  ©  eg  *"4  caP-o  un-» 

•  •••••■••••••a 

0-400000^HOO*HfwUN 

srssssrj 

hhhhhhhh 

k 

i 

UN 

uv 

3S88 

•  •  •  •  l 

WH©^  J 
UN  *\ 

Rg 

3  usiS 

UN  UN  o 

HNAi 

SS$2 ££88 

©  -H  O\C0  ON  cK  aH  eg 
OH  *4  H 

t 

i 

£ 

1 

1 

O 

s 

* 

o 

* 

0\<V  Owt> 

5PRS  1 

.■4  ITv 

SR 

VO  UNO 
t^-  KN  UN 
^4  CVi  VO 

t~-Chc\j-uO'P'pr<^©©5,N 

©  w  iAW  n  o  no 

a  •<<•••*••••(( 

^^N^nVA^irNCg^i^uvi^HO 

©  U>Q  O  WNH©  Q 
©OOOunO^H© 

un  uNUNeg  eg  *4  un  eg 

1 

i 

o 

00 

eg 

oo^ooi 

zz-zit 

O  UN 

58 

1 

S  "AJS" 3" j*  .» .»  kmAkm^A  ~ 

yyyyys?y?yy?oo 

y4  <H  ^  <n  «4  «4  ^  «4  ^  <r4  <H  .  . 

+*  4 UNo4 

222222222222^ 

^uNUNcgcg»H»cNeg 

yyyyyy?? 

*>4>4>4>4>*>«>+> 

aaaaaaaa 

i 

JT 

CM 

m 

R2i 

o°o 

UNeg  unknunun 

WWWMHMHUMWWWSS 

55555555555S5S 

U4I«>4>U4>U4)4>«>U4>  h  h 

22222222222222 

_*  Jt  *\r4  eg  «H  ■*  UN 

yyyyyy ?s 

Wv4*4'H»4<r4«p4«*4 

P4)4l4>4)4)«)4> 

aaaaaaaa 

1 

t 

s 

a 

UN 

U\ 

u 

un  r— 

aa 

*§§ 

UN  UN  UN  J*  UNUNUNUNUNUNUNUN 

jt»  *yvyy**v**H_ 

H«4^»4V<*4*4,H*4a4a4a4D|“' 

4l4*4»4>4*4*4>4f4>4»«>4*(D'* 

222222222222 

UN  UN  UN  UN  UN 

yyyyyyyy 

uuuuuuuu 

22222222 

5 

i 

O  CM 

•  a 
©  UN 

o 

°°S? 

eg  eg  *4  trscg  cgnwcgwH 

mmmmgg 

HHHHHHHH 

mum 

22*22223 

H 

■  *  • 

P  41  U 

f*3S| 

H  a  «  m 
&&&& 

•  ft 

55 

li 

in 
«  •  •» 

K  £  l 
£££ 

4l4t«*WU4>4*4>^4>tf  V  • 

1  f  t  f  1  1  f  f  t  <  1  4  +>  +» 

I  s  a  *  a  *  s  $  s  *  a  s  5  5 

•  •  •  •  a  •  • 

hkhhhhhh 

22212221 

........ 

t 

!i 

i 

o 

*4 

+> 

2 

ooool 

w4  *4  *H  <H  c 

P  V  i>  *>  H 

aaaa* 

o 

O  O 
•4  ^ 

P  4) 

12 

o  o  o 

*4  <4 

4»  *>  4» 

asa 

a  a 

133311113111!! 

sillllllllllji 

li 

mnm 

oooooooo 

41 

■8 

*rt 

I 

j“.!i 

ti  «  •  •  D 

*  M  ** 
c  1 1 1§ 

H  O  O  O  K 

» 

IS 

*! 

.  lj  5 

S  £  3 
s 

out 

HH  tt 

03  03  -P 

22  h 

■  a  •  ?  £  w  p 

41418  cc?tt£  .  .5  a  o 

ssiyiiSifssaacs 

o  o  a  •  a  r<  «  ■  G  -5  f 

S«sSa  K^ji^SS 

sss^SSSs^acfsl 

ft  ft  ft  ft 

ft  ^  ft  ft  ^  ft 

hH  XH  h«4  X  aH 

hukuLubu 

OftOftOftOft 

a  IS  |a  ca  e 

U  0.5  AU  &4>  ft 

1  1  1  1  1  1  1  1 

ssiipes£ 

fcSassSKtt 

K 

ft 

IS 

U 

1 

8 

H 

ft 

8 

ft 

h 

1 

4» 

c 

I 

£ 

f4 

c 

w 

c 

£l 

21  5 

ft 

< 

K 

El 

n 

110 

Improper  branching  or  statement  sequencing,  an  Improper 
Boolean  expression,  altogether  missing  logic,  or  redundant 
code. 

•  LATA  HANDLING.  An  error  in  the  selection  or  initialization 
of  locations;  these  errors  are  the  result  of  improper  vari¬ 
able  initialization,  the  vse  of  the  wrong  variable  name,  or 
an  error  in  the  expression  of  a  subscript  or  substring 
designation. 

The  enumeration  of  each  error  in  all  four  experimental  programs,  as  well; 
is  a  more  detailed  explanation  of  error  types  and  subclasses  is  given  ±J 
Appendix  E. 

Each  program  error  was  assigned  an  error  type  (COMPUTATIONAL, 
■iOGICAL,  DATA  HANDLING),  and  a  subclass.  Each  error  and  its  type/ 
subclass  designation  is  listed  in  Tables  8  through  11.  In  each  table 
16  6hown  the  frequency  of  discovery  by  subjects  for  each  known  error.  : 
[t  is  apparent  that  these  errors  were  nontrivial  since  not  one  of  the 
52  errors  was  found  by  all  22  experimental  subjects.  There  appears  to 
be  a  dichotomy  in  the  difficulties  associated  with  finding  errors  in 
ITAX,  LNPR,  and  OPTM.  For  these  three  programs,  errors  were  found  by  J 

i 

most  of  the  subjects,  or  few  of  them.  The  discovery  frequencies  for 

i 

errors  within  SCNR  exhibit  better  continuity,  and  appear  somewhat  like  j 

I 

a  discrete  version  of  the  negative  exponential  distribution.  ! 

Each  "X"  in  Figure  1J  represents  a  program  error,  and  is  shown 

i 


pver  the  scale  value  corresponding  to  the  number  of  subjects  who  found 

I 

it.  This  figure  indicates  the  approximate  distribution  of  error  diffi¬ 
culties,  as  measured  by  the  percentage  of  the  subject  sample  who 


Ill 


Table  8 


Errors  Found:  1TAX 


Number  Found 

Error  Type 

Error  Subclass 

6 

Computational 

Missing  computation 

7 

Computational 

Missing  computation 

5 

Data  handling 

Var.  by  -wrong  namp 

16 

Logical 

Wrong  Boolean  exp. 

15 

Computet lonal 

Improper  expression 

14 

Computational 

Improper  expression 

12 

Computat lonal 

Improper  expression 

5 

Data  handling 

Improper  lnit. 

12 

Logic 

Wrong  Boolean  exp. 

5 

Data  handling 

Var.  by  wrong  name 

112 


Table  9 

Errors  Found:  LNPR 


Number  Found 

Error  Type 

Error  Subclass 

12 

Computational 

Improper  expression 

11 

Logical 

Wrong  Boolean  exp. 

1 

Data  handling 

Subscript/ substring 

5 

Logical 

Wrong  branch/ seq. 

8 

Data  handling 

Subscript/ substring 

16 

Logical 

Wrong  Boolean  exp. 

1 

Data  handling 

Inproper  init. 

k 

Data  handling 

Inproper  init. 

2 

C  cmputat ional 

Missing  computation 

2 

Computational 

Missing  conputation 

2 

Computational 

Missing  computation 

k 

Computational 

Inproper  expression 

2 

Logical 

Wrong  branch/ seq. 

5 

Logical 

Wrong  Boolean  exp. 

2 

Logical 

Wrong  branch/ seq. 

2 

Logical 

Wrong  branch/seq. 

Table  10 
Errors  Found: 


Number  Found 

Error  Type 

Error  Subclass 

18 

Computational 

Improper  exp. 

21 

Computational 

Improper  exp. 

21 

C  omputat ional 

Improper  exp. 

2 

Computational 

Improper  exp. 

13 

Logical 

Wrong  branch/seq. 

6 

Computational 

Machine  lim. 

13 

Logical 

Wrong  branch/seq. 

Table  11 


Errors  Found: 


Error  Number  Number  Found  Error  Type 


Logical 

Logical 

Logical 

Logical 

Logical 

Data  handling 

Data  handling 

Computational. 

Logical 

Logical 

Logical 

Data  handling 

Data  handling 

Computational 

Data  handling 

Data  handling 

Data  handling 

Computational 

Computational 

Logical 

Computational 

Data  handling 

Data  handling 

Data  handling 

Data  handling 

Data  handling 

Logical 

Logical 

Logical 


Error  Subclass 


Missing  logic 
Missing  logic 
Missing  logic 
Missing  logic 
Missing  logic 
Improper  init. 
Improper  init. 
Improper  exp. 
Missing  logic 
Missing  logic 
Wrong  branch/6eq. 
Subscript/ substring 
Var.  by  wrong  name 
Missing  computation 
Improper  init. 
Improper  init. 
Improper  init. 
Improper  exp. 
Missing  computation 
Missing  logic 
Missing  computation 
Improper  init. 
Improper  init. 
Improper  init. 
Improper  init. 
Improper  init. 
Redundant 
Redundant 
Redundant 


L5 


x  x 


116  ! 


discovered  an  error.  If  all  errors  were  equally  difficult,  one  would 
expect  to  see  a  symmetric  histogram.  The  variability  of  subject  skills 

I 

may  account  for  the  spread  of  discovery  frequencies  for  SC NR;  the  range 
of  discovery  counts  is  much  too  large  for  the  remaining  programs,  how¬ 
ever,  to  support  the  contention  of  similar  error  difficulties. 

It  is  reasonable  to  expect  error  class  frequencies  to  be  re¬ 
lated  to  the  kinds  of  programs  in  which  the  errors  are  found.  It  would  I 

l 

also  seem  possible  that  the  discovery  of  particular  classes  of  errors  i 
might  be  related  to  program  characteristics.  For  the  purpose  of  ana- 

i 

lyzing  these  possible  relationships,  all  of  the  information  regarding  j 

i 

1 

error  discoveries  has  been  summarized  in  Table  12.  The  interior  table  ! 

i 

I 

entries  give  the  number  of  times  that  each  error  was  found  across  all  j 

I 

» 

22  subjects  (numerator),  and  the  number  of  errors  represented  by  that 
entry  (denominator).  Totals  are  provided  across  error  types,  error 

i 

subclasses,  and  programs,  as  well  as  the  ratios  resulting  from  the  j 

division  of  frequency  of  discover  •  by  frequency  of  occurrence.  I 

i 

In  reviewing  the  ratios  for  HAX  (10.5,  14.0,  5*0),  LNFR  (2.0, 

| 

5.14,  3-5),  OFTM  (13-6,  13.0,  -),  and  SCNR  (3*17,  2.75,  5-0),  there  is  j 
ao  reason  to  suspect  a  relationship  between  the  composition  of  error 
discovery  ratios  and  program  types.  Program  types  do  appear  to  strongly 
affect  overall  discovery  ratios,  considering  the  low  discovery  ratios 
for  the  logically  complex  LNPR  (4.8l)  and  SCNR  (3.60).  Across  all  pro- 
jrams,  discovery  ratios  are  highest  for  computational  and  logical  errors 
[7.25  and  6.5,  respectively),  and  lower  for  data  handling  errors.  Cor¬ 
respondingly,  the  error  population  of  OFTM,  the  program  with  the  highest 


Table  12 

Breakdown  of  Error  Types  Found  by  Program 


discovery  ratio,  is  composed  mainly  of  computational  errors,  while  the 
error  population  of  SC NR,  the  program  with  the  lowest  discovery  ratio, 
contains  a  higher  percentage  of  data  handling  errors  than  any  other 
program.  The  error  subclasses  with  the  highest  discovery  ratios.  Im¬ 
proper  Expression  (11. 0)  and  Wrong  Boolean  Expression  (10. 67)  corres¬ 
pond  to  the  kinds  of  errors  which  would  most  visibly  disagree  with  the 
program  specifications.  The  lowest  discovery  ratio.  Missing  Logic,  is 
associated  with  the  least  visible  of  all  errors  since  it  applies  to 
situations  in  which  a  conditional  transfer  is  missing.  Perhaps,  second 
in  their  lack  of  visibility  are  errors  represented  by  the  Missing  Com¬ 
putation  category;  this  subclass  also  has  the  second  smallest  discovery 
ratio. 


Chi-Square  Analysis  of  Error  Frequencies 


One  question  of  interest  in  this  research  is  whether  errors  can 
reasonably  be  modeled  as  equal  in  difficulty.  Errors  of  equal  difficult; 
should  have  an  equal  opportunity  to  be  "sampled"  (by  the  debugging  pro- 
r)  by  being  discovered.  The  distribution  of  successive  error 
erarrival  times  could  then  reasonably  be  modeled  as  negative  expo- 
pential,  as  is  assumed  by  various  Software  Reliability  Theorists  (see 
bhapter  TV). 

A  hypothesis  of  equal  difficulties  corresponds  to  a  hypothesis 


of  equal  error  discovery  frequencies.  This  latter  hypothesis  can  be 
tested  by  a  X2  test.  Each  error  is  assumed  to  have  an  equal  chance  of 
ing  included  in  the  errors  detected  by  the  i^*1  programmer.  Hence, 


t  is  expected  that  each  of  the  K  errors  will  be  discovered 


e 


119  i 

i 

=  L  n./K  =  N/K 

lSiS22  1  ; 


times,  where  N  is  the  total  number  of  errors  found  by  all  subjects. 

Let  f  denote  the  frequency  of  discovery  for  the  Jth  error.  Then  the 
J 

X2  statistic 


X2  = 


Z 

ISJSk 


i 


is  approximately  X2  distributed  vith  k-1  degrees  of  freedom.  Since 
each  error  discovery  is  not  a  random  sample,  the  chi-square  test  condi¬ 
tions  are  not  fully  met.  Since,  however  each  f  is  bounded  by  0  and  22 

u 

(the  number  of  programmers),  not  0  and  UK,  this  test  is  conservative 
|since  extreme  values  of  f  are  less  likely,  as  are  large  X2  values. 

The  results  of  this  analysis  are  shown  in  Table  1J>.  The  hy¬ 


pothesis  of  equally  likely  occurrences  is  rejected  for  each  of  the  four 

programs  at  a  significant  level  of  =  .05.  Furthermore,  the  X2  values 

I 

for  LNFR,  OFTM,  and  SCNK  are  decidely  improbable  with  probabilities  of 
accurrence  of  less  than  .01  for  the  stated  hypothesis.  A  second  test 
was  performed  for  SCNR  after  removing  the  frequency  accounting  for  19 
discoveries  (a  possible  outlier);  as  can  be  seen  in  the  table,  the  re¬ 
maining  frequencies  still  differ  significantly  from  the  hypothesized 
frequency  distribution. 

!  Tables  8,  9,  10,  11,  and  12  and  Figure  13  emphasized  the  dis¬ 

covery  frequency  of  errors  in  an  attempt  to  ascertain  which  errors 
appear  more  difficult  to  detect.  The  order  in  which  errors  are  found 
Is  nearly  as  important  in  determining  error  difficulty,  if  one  defines 


total  time  used  that  was  counted  toward  error  discovery  was  the  minutes 
of  code  review  and  error  correction  time  spent  prior  to  error  discov¬ 
eries.  Any  time  spent  by  the  subject  using  the  computer  terminal,  con¬ 
structing  test  data  or  "on  break"  was  not  accumulated  in  these  totals. 

One  distributional  analysis  of  error  discovery  tines  was  con¬ 
ducted  on  the  reduced  6et  of  data  systematically  extracted  from  the 
discovery  times  recorded  by  each  of  the  22  sets.  For  each  experimental 
program,  subject  error  discovery  times  were  studied  and  any  error  with 
less  than  two  discoveries  among  all  22  subjects  was  dropped.  Similarly, 
any  subject  who  had  found  less  than  two  errors  was  dropped  from  the 

i 

I 

group  under  analysis.  The  resulting  matrix  of  discovery  times  generally 
included  60-80$  of  the  known  errors  and  from  10  to  16  of  the  original  j 
subjects. 

Tables  l4,  15,  16,  and  17  show  the  reduced  data  sets  used  for 

| 

the  analysis  of  error  discovery  times.  The  columns  indicate  errors 
Ithat  had  a  sufficient  number  of  errors  to  warrant  inclusion.  Each  row  j 

f 

jof  the  table  corresponds  to  a  subject,  and  excludes  those  subjects  who  ' 

I 

jfound  less  than  two  of  the  errors  analyzed.  The  few  errors  found  by 
the  excluded  subjects  tended  to  be  those  found  most  ofte..  by  the  remain¬ 
ing  subjects.  Hence,  the  remaining  set  of  subjects  is  assumed  to  be 
representative,  if  not  of  subject  proficiency,  at  least  of  error  diffi¬ 
culty  composition.  The  excluded  errors  had  been  discovered  at  most 
once;  dropping  these  errors  from  the  estimation  procedure  was  expected 
to  introduce  very  little  error  in  the  results. 

There  were  a  number  of  reasons  motivating  this  decision  to 
reduce  the  analysis  set.  The  philosophical  justification  for  excluding 


easy  errors  as  those  generally  found  early  and  difficult"  errors  as  j 
those  generally  found  later.  It  is  more  difficult  to  work  with  error 
discovery  orders  since  many  individuals  failed  to  detect  some  errors 
altogether.  One  means  of  overcoming  thi6  difficulty,  for  each  indi¬ 
vidual,  is  assigning  ranks  to  errors  in  the  order  that  that  individual 
discovered  them.  The  errors  that  vent  undetected  are  then  a?l  assigned 
the  average  of  the  remaining  ranks. 


Table  13 

Chi-Square  Analysis  of  Error  Discovery  Frequencies 


Programmer 

Number  of 

Errors 

Xs  Statistic 

ITAX 

10 

18.98 

LNFR 

16 

66.17 

OPTM 

7 

23-96 

SC  NR 

29 

119.03 

SCNR 

28 

57-67 

Development  of  Discovery  Times  ; 


During  the  course  of  the  experiment,  the  data  regarding  the 
activities  of  each  subject  were  recorded.  Every  minute  of  the  subjects 
debugging  session  was  accounted  for  and  significant  events  were  "time- 
stamped."  These  events  included  the  6tart  and  stop  times  of  each  test¬ 
ing  activity,  as  well  as  the  times  for  error  discovery  and  correction. 
Hence  it  was  possible  to  compute  the  ordered  error  discovery  times  for 
jeach  subject  on  a  time  scale  which  excluded  irrelevant  activities,  and 


iwhich  was  comparable  for  all  subjects.  For  designated  errors,  the 


Error  Number 


6 


2 

3 

16 

15 

38 

33 

1 

9 

24 

22 

13 

6 

17 

23 

31 

21 

20 

10 

6o 

60 

38 

3^ 

25 

36 

42 

38 

30 

30 

25 

— 

43 

20 

9° 

19  1 

£  £  I  £  I  I  I  I  I  I  I 


I  I  -  I  I  I  I  I  I  § 


J* 

3 


i  i  i  i  i  a  *  §  i  i 


£  S' 

H  H 


«J 

1 

1 

3 

1 

1 

1 

P 

JR 

£ 

UN 

a 

1 

1 

& 

H 

1 

1 

3 

8 

1 

R 

1 

53 

1 

1 

1 

I 

1 

£ 

UN 

UN 

3 

i 

8 

1 

UN 

H 

5 

i 

1 

£ 

1 

§ 

s 

§ 

1 

SR 

SJ 

5 

1 

1 

sc 

2 

1 

JR 

1 

§ 

JR 

n 

U\ 

CM 

1 

£ 

R 

3 

1 

1 

P 

s 

i 

1 

1 

tt\ 

*N 

H 

1 

1 

8 

I 

ft 

i 

s 

1 

1 

ITS 

3 

JR 

1 

1 

1 

1 

ft 

H 

jf 

*H 

1 

1 

3 

ff 

1 

1 

8 

5 

! 

1 

1 

1 

S 

1 

1 

1 

1 

1 

a 

£ 

8 

1 

s 

8 

1 

1 

3 

r-4 

ft 

£  P 

r*  r-i 


§  3  I 


125 


low-performance  subjects  was  that  these  subjects  were  not  representa¬ 
tive  of  the  population  of  skilled  professionals  to  which  these  results 
might  be  generalized.  The  practical  reasons  for  subject  exclusion  was 
that  many  of  the  estimations  and  search  routines  exhibited  undue  sensi¬ 
tivity  to  a  subject's  lack  of  data.  The  exclusion  of  errors  with  only 
one  observation  was  somewhat  more  justifiable,  as  this  exclusion  did 
not  affect  the  number  of  known  errors.  The  cost  in  terns  of  estimation 
accuracy  of  excluding  a  single  observation  was  assumed  to  be  small. 
Moreover,  the  inclusion  of  single  points  proved  to  make  some  of  the 
estimation  algorithms  unusable  for  all  other  errors. 

Rank  Analysis  of  Ordered  Discovery  Times 
The  first  analysis  performed  on  the  set  of  discovery  times  in¬ 
cluded  a  rank  analysis  to  determine  if  there  existed  any  uniformity  in 
the  order  in  which  subjects  discovered  errors.  The  discovery  times  for 
each  subject  were  ordered  and  assigned  increasing  ranks.  Ties  were 
handled  by  the  mid-rank  method  (i.e.,  if  the  mtb  through  (m  +  j)  dis¬ 
covery  times  were  identical,  all  were  assigned  the  rank  (2m  +  j)/2). 

For  any  subject,  if  exactly  k  errors  were  found,  each  of  the  n-k  undis¬ 
covered  errors  were  assigned  the  mid-rank  (k  +  1  +  n)/2.  Ranks  were 
assigned  for  all  22  subjects  and  included  all  known  errors. 

i 

An  inspection  of  the  ranks  from  subject  to  subject  informally  j 

I 

confirmed  the  hypothesis  that  errors  were  of  unequal  difficulty.  While  ! 
some  errors  were  nearly  always  found  early  by  all  subjects,  other  errors 

I  I 

(were  found  by  no  subjects.  If  all  errors  were  equally  likely  to  be 
jfound  in  a  given  time,  each  permutation  of  discovery  orders  would  be 


equally  likely.  Such  a  hypothesis  of  equally  likely  permutations  can 
be  tested  by  computing  Kendall's  coefficient  of  concordance.  This 
statistic  (t)  ranges  from  0  to  1  and  is  a  ratio  involving  the  sums  of 
ranks  across  all  subjects  among  whom  discordance  is  hypothesized.  The 
formula  for  this  statistic  is  given  by 


S  2  -  3a?n(n  +  l)2 

W  =  Z  - 

lSJSn  mn2(n2  -  l)T 


m  =  the  number  of  rankers 


n  =  the  number  of  ranked  objects 


S.  =  Z  R  =  sum  of  ranks  for  error  j, 
J  liiSm 


(*’“  I 


Zt  *  -It, 


where  t^  is  the  number  of  ties  and  l  is  summed  over  nil  sets  of  ties, 
rhe  hypothesis  that  is  tested  is  one  of  no  association  among  the  m 
rankings,  and  the  value  m(n-l)W  is  approximately  chi-square  distributed 
with  n-1  degrees  of  freedom  (Gibbons,  1976). 

Twelve  hypothesis  tests  were  conducted.  For  each  of  the  four  j 
programs  a  group  W-test  was  conducted.  Then  each  of  the  group  was  spll 
into  two  groups,  on  the  basis  of  whether  white-box  or  black-box  test 
data  were  employed. 

One  would  expect  the  differences  in  test  data  (white-box  or 

! 

(black-box)  to  modify  the  order  in  which  errors  were  found.  If  the  sub¬ 
jects'  error  pursuit  is  strongly  conditioned  by  stimuli  generated  by 


127 


the  program,  and  all  errors  are  equally  likely  to  be  provoked  by  a 
given  set  of  test  data,  then  different  test  data  sets  should  influence 
error  discovery  order  differently.  To  investigate  the  effects  of  test 
data  type  on  error  discovery  order  a  Kendall's  rank  correlation  measure 
jwas  computed • using  the  pairs  of  rank  sums  for  each  test  data  group. 

|The  formula  for  this  measure  is: 


where  the  summation  is  over  all  sets  of  tied  ranks  for  each  of  the 
groups  and  n  i6  the  number  of  ranked  objects  (Gibbons,  1976). 

Tables  18  through  21  contain  the  results  of  this  analysis. 

For  each  group  (White-box  and  Black-box)  a  product-moment  correlation 
has  been  computed  to  compare  with  the  T-statistic.  The  coefficients 
of  concordance  show  a  very  strong  similarity  of  rankings  among  subjects 
within  each  group,  for  LNFR,  OPTM,  and  SCNR  (significant  at  a  =  .001) 
and  a  weaker  concordance  for  ITAX  (a  =  ,l).  This  suggests  that  the 
low-ccmplexity,  low-computation  ITAX  may  contain  errors  of  more  similar 
difficulty,  thu6  leading  the  subjects  to  generate  greater  numbers  of 
discovery  permutations.  Concordance  among  the  total  group  is  high  for 
all  programs,  which  would  lead  one  to  anticipate  reasonably  high  cor¬ 
relations  between  group  rank  sum.  Such  is  the  case  with  the  highest 
correlation  found  for  the  low  logical-complexity  ITAX  and  OPTM.  This 


Tabl*  19 


Noteu.  Coefficient  of  concordance 


130 


Error  number 


Mean  rank 
(white -box) 


Table  20 

Analysis  of  Discovery  Time  Ranks  (OFIM) 


5  6 


Mean  rank 
(black-box) 


2 

3.00 

2.41 

3 

2 

2.77 

2.18 

3 

2 

Rotes.  Kendall  b  tau  statistic  *  .933 

Significant  for  a  •  .01 

Product-moment  correlation  =  .992 

Coefficient  of  concordance  (white-box) 
X2  =  49.33  with  10  df 

Coefficient  of  concordance  (black-box) 
X2  =  51.50  with  10  df 


Mean  rank 
( total) 


2.89 

2.30 

3 

2 

5.59 

5.00 

6 

5 

6.00 

5.05 

6 

5 

=  .640 

=  668 

5.80 

5.02 

6 

5 

Motes.  Coefficient  of  concordance  (total) 
X2  =  100.45  with  10  df 


Table  21 


Notes.  Coefficient  of  concordance  (total)  •  .270  X*  a  178.67  with  10  df 


Mil  gl— Contlnuxl 


Tkbl*  21— Continued 


lj4 


is  to  be  expected  for  two  reasons:  First  in  low  complexity  programs  ! 
the  number  of  execution  paths  is  limited  and  test  data  will  tend  to 
have  less  of  an  influence  over  the  stimuli  generated  by  program  execu-  | 
tion.  Second,  the  number  of  total  errors  was  small  and  a  much  greater  ! 
percentage  of  them  were  found  by  most  subjects.  Hence,  a  massive 

I 

assignment  of  mid-ranks  for  undiscovered  errors  was  not  necessary  (as  ' 
it  was  for  SCNR  and  LNPR),  with  its  concomitant  effect  of  "watering  j 

I 

down"  rank  discrimination.  j 

I 

Simple  Correlations 

Simple  product -moment  correlations  were  computed  between  all 
combinations  of  questionnaire  and  performance  variables  (Table  22) 
as  these  were  needed  for  a  later  multiple  regression.  As  will  be  6een 
in  later  sections,  many  of  these  variables  are  6ubtlely  interrelated 
and  simple  correlations  are  insufficient  as  information  for  proper 
interpretation.  However,  a  few  observations  regarding  the  strongest  j 

relationships  will  be  made.  The  educational  background  group  were  in  j 

i 

general  positively  correlated  with  all  performance  measures.  The  j 

i 

experience  category  appears  to  be  of  mixed  value  in  predicting  debug- 

I 

ging  performance.  Of  the  6elf -asses ament  categories,  proficiency  in 

the  technical  areas  of  computer  science  principles  and  algorithms  were  : 

1 

significantly  related  to  performance,  and,  to  a  lesser  degree,  mathe-  ' 
matic6  and  system  design.  Nearly  all  variables  measuring  subject 
familiarity  with  the  experimental  programs  were  positively  correlated 
with  performance.  Moreover,  familiarity  with  SCNR  (the  program  which 
most  exemplifies  applied  computer  science)  was  remarkably  effective 
in  predicting  overall  subject  success  in  the  experiment. 


HMA  «Hi4Pm  «*»« 


Table  22 

Simla  Correlation!  of  Questionnaire  Variable*  with  Performance  Measure* 


Humber  of  Error*  Found 


Questionnaire 

Variable* 

1 

mx 

LNFR 

GPTH 

SCI® 

Simple 

Total 

Standardized 

Total 

A«* 

-.290 

4.102 

|  -0* 

-.123 

-.104 

-.119 

Tears  of  Schooling 

-.196 

4.436 

♦.326 

+.053 

4.195 

4.215 

Highest  Degree 

-.070 

4.399 

4.428 

4.212 

4.310 

+•336 

Caapute  r -Related 
Course  Hark 

-.104 

-.008 

4.280 

-.103 

-.037 

+.023 

Computer-Required 
Course  Hark 

4.076 

4.066 

4.225 

4.109 

4.068 

4.132 

theory  *  Practice 
Course  Work 

-.125 

4.061 

4.049 

4.060 

4.029 

+.015 

Experienoe 

General 

-.135 

4.1B3 

-.147 

-.095 

H 

S 

• 

1 

-.085 

Experience 

Programing 

-.121 

4.242 

-.352 

-.104 

-.048 

-.116 

Data  Processing 
feinclplea 

>-133 

4.084 

-.234 

-.101 

-.092 

1 

b 

Counter  Science 
Principles 

4.412 

4.158 

+.015 

+.440 

+.394 

4.355 

Systems  Analysis 

4.075 

4.234 

4.207 

-.128 

+.077 

4.134 

Systems  Design 

4.110 

4.388 

4.060 

4.130 

+.252 

4.238 

Program  Dsal^i 

4.316 

4J.70 

-.366 

+.097 

♦.137 

♦.075 

Program  Writing 

4.143 

4.266 

-.436 

4.222 

4.181 

4.067 

Program  Debugging 

4.169 

4.333 

-.>88 

4.260 

+.240 

4.129 

Operations  Research 

4.U1 

-.265 

+.044 

-.042 

-.087 

-.053 

Probability/ 

Statistics 

4.035 

-.023 

+.155 

-.193 

-.073 

-.009 

Mathematics 

4.235 

4.161 

4.224 

+.078 

+.199 

4.242 

Plla  Handling 

-.006 

4.130 

-.088 

-.077 

-.005 

-.014 

Algorithms 

4.424 

4.257 

4.280 

+.294 

4.403 

♦.435 

ZEAZ-Theoxy 

+.331* 

4.297 

.000 

4.103 

4.253 

+.254 

JXMC-Pimctlc* 

*.295 

4.316 

4.088 

♦.297 

4.362 

♦.345 

tHPR-Theary 

-.097 

4.064 

+.3**7 

--3Q2 

-.098 

+.004 

UIPR-Prectlc* 

4.120 

4.234 

4.383 

j 

-.005 

+.180 

4.254 

OFW-Theory 

4.244 

4.316 

♦.307 

-.069 

4.196 

4.276 

OPW-Prartlce ' 

4.436 

+.396 

4.290 

4.154 

4.390 

4.443 

SCOT -Theory 

4.182 

+.417 

+.237 

4.625 

4.562 

4.506 

SCOT-Pract Ice 

♦•375 

+.507 

4.434 

4.595 

+.663 

4.662 

156 


Multiple  Regression  Analysis 

A  series  of  linear  stepvise  multiple  regression  analyses  were  j 
performed  using  the  performance  variables  (e.g.,  number  of  errors  j 

I 

found)  as  dependent  variables  and  the  individual  questionnaire  data  as 
independent  variables.  The  intent  of  this  analysis  was  not  to  arrive 

I 

at  some  useful  formula  for  predicting  an  arbitrary  programmer's  debug-  ' 
ging  performance.  The  small  sample  size  (22)  and  large  number  of  inde- ■ 
pendent  variables  (17)  employed  requires  a  skeptical  attitude  toward 
the  reliability  of  these  regression  results  as  sane  form  of  screening 
device  for  choosing  potentially  good  debuggers.  It  was  hoped,  though, 
that  the  regression  results  would  confirm  some  general  notions  devel¬ 
oped  by  the  investigator  after  many  hours  of  studying  the  experimental  | 
results. 

A  few  observations  regarding  the  subject  group  composition 
helps  in  understanding  the  results  of  the  regression  analysis.  The 
group  was  specifically  chosen  for  it 6  diversity,  as  the  variability 
in  age,  education,  and  experience  indicates.  One  could  also  identify 
subgroups  of  like  subjects:  older,  more  highly  educated  computing 
educators,  young  currently-active  system  programmers,  and  middle-aged 
data-processing  oriented  applications  programmers.  There  were,  in 
the  sample,  a  selection  of  extraordinary  subjects:  a  bright  sporad¬ 
ically-performing  ninth-grade  dropout,  a  very  high  performing  computer 
science  professor  with  no  formal  education  in  the  area  save  self-study, 

land  a  highly-educated  much-experienced  subject  with  an  (apparent) 

1 

jbackground  in  systems  programming  (then  education)  who  performed 
Relatively  poorly.  The  point  of  these  observations  is  that  the 


137 

"average"  group  of  programmers  is  quite  often  similarly  diverse  and 
any  analysis  of  a  small  set  of  such  subjects  is  apt  to  reflect  the 
widely-varying  personalities  of  the  group  as  much  as  the  general 
associations  sougit. 

The  variables  used  in  these  regression  runs  included  age, 
educational  background,  experience,  end  self -evaluation  variables 
extracted  from  the  questionnaires.  A  list  of  the  variables  used  and 
some  mnemonic  abbrevations  are  shown  in  Table  2J  indicating  the  results 
of  a  forward-selection  stepwise  multiple  linear  regression  using  these 
variables  as  predictor  variables,  and  the  number  of  errors  found  by 
the  program  as  the  dependent  variable.  Table  24  shows  the  first  six 
variables  selected  by  the  regression  routine  for  inclusion  in  the  pre¬ 
dictive  equation.  With  each  variable  16  given  the  sign  of  the  regres¬ 
sion  coefficient  associated  with  each  successive  predictor  variable. 

Performance  on  ITAX  was  negatively  affected  by  nearly  all  edu¬ 
cational  variables,  as  well  as  by  familiarity  with  the  theory  behind 
SC  NR,  normally  attained  in  an  academic  setting.  All  other  variable 
inclusions  are  inexplicable.  ITAX  was,  however,  the  simplest  of  the 
programs  and  most  subjects  did  fairly  well,  leading  one  to  conclude 
that  remaining  associations  (e.g.,  FPOFTM)  may  be  spurious.  Perfor¬ 
mance  on  LNFR  was  expected  to  be  strongly  dependent  upon  an  under¬ 
standing  of  the  problem,  an  ability  to  comprehend  prolonged  (mathe¬ 
matically)  algorithmic  specification,  and  experience  with  some  of  the 
subproblems  within  LNFR  (e.g.,  matrix  inversion).  Experience  with 
problems  like  SC NR  proved  more  important,  as  it  proved  so  in  perfor¬ 


mance  on  SC NR  and  OPTM. 


Table  23 

Mnemonics  for  Regression  Variables 


Mnemonic 

Abbreviation 


Variable 


FTSCH 

Years  of  Schooling 

HffiEG 

Highest  Degree 

SUCR 

Computer-Related  Course  Work 

SURP 

Computer-Required  Course  Work 

SUTP 

Theory  &  Practice  Course  Work 

EXPGEN 

Experience:  General 

EXPPOd 

Experience:  Programming 

DPPRIN 

Data  Processing  Principles 

CSPR1N 

Computer  Science  Principles 

BY  SAUL 

Systems  Analysis 

SYSEES 

Systems  Design 

PGMDES 

Program  Design 

PGMWKT 

Program  Writing 

PGMDBG 

Program  Debugging 

OPSRES 

Operations  Research 

PROBSIAT 

Prohab ility/St at i st ic s 

MATH 

Mathematics 

FILHAN 

File  Handling 

ALG 

Algorithms 

FTITAX 

ITAX-Theory 

FPITAX 

ITAX -Practice 

FTLNPR 

LNPR-Theory 

FFLNPR 

LHPR-Practice 

FTOPTM 

OPTM-Theory 

FPOPTM 

OFTM-Practice 

FTSCNR 

SC NR -Theory 

FPSCNR 

SCNR-Practice 

Table  Zk 


Like  HAX  most  individuals  found  a  fair  number  of  errors  in 
LNPR  so  that  characteristics  of  the  few  exceptionally  high-  and  low- 
performing  subjects  undoubtedly  had  a  greater  effect  on  the  independent 
variables  selected.  No  reasoning  is  apparent,  however,  for  the  par¬ 
ticular  choices.  Regression  results  for  SCNR  are  well  within  expecta¬ 
tion.  As  SCNR  was  written  as  a  finite  state  automaton,  educational 
exposure  to  this  formal  theory  was  expected  to  be  important,  and  proved 
so  with  the  Inclusion  of  FTSCNR,  CSPRIN,  and  HUEG.  Since  language 
theory,  and,  for  example,  compiler  courses,  are  a  rather  recent  aca¬ 
demic  topic,  it  is  not  surprising  to  find  negative  regression  coeffi¬ 
cients  for  age-related  variables  such  as  ACE,  SUTP,  and  SUCR. 

The  last  column  of  Table  25  shows  the  results  from  a  regression 
with  a  standardized  performance  measure  as  the  dependent  variable. 

This  variable  was  computed  as  the  sum  of  the  standardized  scores  on 
ITAX,  LNPR,  OPTM,  and  SCNR.  The  standardized  scores  were  computed  by 
subtracting  from  each  subject' 6  numbe r-of-e rr or s -found  the  mean  across 
all  subjects,  then  dividing  this  difference  by  the  sample  standard 
deviation. 

The  first  variable  to  enter  the  regression  equation  for  this 
standardized  total  was  the  ubiquitous  FPSCNR.  A6  stated  above,  it 
appears  that  those  subjects  who  had  had  experience  programming  applica¬ 
tions  similar  to  SCNR  performed  significantly  better  than  average.  Fa¬ 
miliarity  with  parsing  algorithms  like  SCNR  would  tend  to  exemplify  a 
sophisticated  computer-science-oriented  individual  with  6y6tem  program¬ 
ming  experience,  a  likely  candidate  for  high  performance.  Advanced 


degree  attainment  appears  as  an  important  variable,  even  though  it  is 


Table  25 


lte 

positively  correlated  with  the  adversely  influencing  age  measures. 
Further  conclusions  and  speculations  regarding  the  individual  factors 

i 

affecting  debugging  performance  requires  allusion  to  results  obtained 
outside  the  regression  analysis,  and  vill  be  postponed  until  the  next 

i 

chapter.  i 

I 

Analysis  of  Variance  | 

I 

For  each  program,  the  number  of  errors  found  by  a  subject  was 
divided  by  the  length  of  the  debugging  sessions  (measured  in  hours), 
and  the  total  number  of  errors  known  to  be  in  the  program.  This  mean 
percentage  of  errors  per  hour  (MEEH)  measure  was  employed  as  the  cri¬ 
terion  variable  in  an  analysis  of  variance,  performed  in  conformance 
with  a  procedure  outlined  by  Winer  (1971).  The  division  by  the  length 
of  the  debugging  session  normalized  the  effect  of  differing  time  allot¬ 
ments  assigned  for  each  program.  The  division  by  the  known  number  of 
errors  adjusted  the  dependent  variable  (MFEH)  for  the  fact  that  each 
of  the  four  programs  had  a  different  number  of  resident  errors  avail¬ 
able  for  discovery. 

The  first  step  in  the  data  analyses  is  6hown  in  Tables  26  and 
27  which  reflects  the  experimental  design.  Each  subject  was  randomly 

I 

assigned  into  one  of  two  groups  and  each  group  was  assigned  four  of  the  I 
eight  combinations  of  program  and  test  data  type  (see  Chapter  IV). 

Tils  design  allows  within-subject  estimates  of  main- and  second-order  | 

i 

effects  to  be  made.  I 

Following  Winer,  the  comparisons  associated  with  the  effects  ' 


are  computed  by  calculating  the  weighted  sums  of  treatment  combination 


Table  26 

Sums  of  MFEH  for  Group  I 


•3 


Subject 

No. 

ITAX-BB 

LNPR-EB 

OPTM-WB 

SCNB-WB 

1 

30 

6.25 

12.5 

2.15 

2 

70 

14.58 

25.0 

4.3 

3 

40 

20.83 

25.0 

9.68 

4 

60 

14.58 

31.25 

9.68 

5 

4o 

10.42 

31-25 

5.38 

6 

50 

2.08 

31.25 

2.15 

7 

10 

6.25 

25.0 

3.23 

8 

30 

4.17 

18.75 

3.23 

9 

20 

2.08 

18.75 

0 

10 

80 

14.58 

37-5 

8.6 

11 

30 

2.08 

25.0 

3.23 

Total 

460 

97-9 

281.25 

51.63 

50.9 

113.68 
95.51 

115.51 

87.05 

85.48 

56.15 

40.83 

140.68 
60.31 


Subject 

No. 


Table  2? 

Sums  of  MFEH  for  Group  II 


ITAX-WB 

LNER-WB 

OPTM-BB 

SCKR-BB 

50 

0 

18.75 

10.75 

40 

6.25 

18.75 

4.3 

60 

6.25 

31.25 

9.68 

10 

4.17 

31.25 

2.15 

50 

0 

^5 

1.08 

30 

4.17 

37-5 

1.08 

30 

8.33 

31.25 

7.53 

70 

8.33 

31.25 

5-38 

40 

16.67 

31.25 

15.05 

50 

6.25 

31.25 

2.15 

60 

2.08 

31.25 

2.15 

490 

62.5 

306.25 

61.3 

79.5 

69-3 

107.18 

47.57 

63.58 
72.25 


Total 


'5 


Ikk 

totals,  shown  in  Table  28.  These  comparisons  indicate  a  strong  nega¬ 
tive  effect  of  logical  complexity  and  computational  content  upon  error 
discovery  as  was  to  be  expected.  The  table  also  indicates  a  large 
positive  interaction  effect  between  these  factors.  This  was  also  to 
be  anticipated  as  the  MEEH's  for  LNFR  (high  LC,  high  CC)  were  higher 
than  those  for  SCNR  (high  LC,  low  CC)  for  many  subjects.  The  summary 
of  this  analysis  of  variance  is  shown  in  Table  29.  The  LC  and  CC  main 
effects,  as  well  as  their  interaction,  are  clearly  significant  as 
F>99(1,61)  =  7-08. 

The  surprising  and  counterintuitive  result  is  the  positive 
effect  of  LC-CC  interaction.  The  subjects  found,  on  the  average, 
more  errors  within  SC  NR  than  within  LNPR,  but  SC  NR  had  almost  twice 
as  many  known  errors,  leading  to  a  smaller  mean  percentage  of  errors 
per  hour.  SC NR  was  shown  in  an  earlier  section  to  be  composed  of  more 
difficult  errors — those  with  low  visibility.  Apparently  this  differ¬ 
ence  more  than  offset  the  greater  number  of  errors  within  SC  NR.  With 
larger  programs  like  SCNR  and  LNFR  the  effects  of  successively  larger 
error  discovery  times  becomes  important.  Hence  the  effect  of  doubling 
the  number  of  errors  in  a  program  is  apparently  not  reciprocally  offset 
by  a  proportional  increase  in  error  discoveries.  This  fact  may  have 
made  SCNR  more  difficult  to  debug  than  LNPR,  even  though  the  latter 
had  a  higher  degree  of  computational  content. 

A  Distributional  Model  of  Discovery  Times 

It  is  reasonable  to  assume  that  the  search  for  program  errors 
is  indiscriminant,  in  that,  a  priori,  no  error  is  singled  out  as  the 


object  of  pursuit  prior  to  its  discovery.  Hence,  one  may  conclude 
that  the  search  for  all  errors  proceeds  simultaneously  in  parallel 
time,  and  that  the  differences  among  discovery  times  is  attributable 
to  the  error  "obviousness,”  the  programmer's  skill  in  pursuit,  and  the 
effects  of  randomness.  The  simplest  probability  model  describing  the 
likelihood  of  error  discovery  is  that  of  the  negative  exponential  dis¬ 
tribution.  In  adopting  it,  one  assumes  that,  for  an  arbitrary  error, 
the  probability  of  finding  an  error  in  the  next  small  time  quantum  is 
identical  irrespective  of  the  length  of  time  devoted  thus  far  to  the 


search. 


It  was  further  assumed  that  the  differences  between  the  rela¬ 


tive  proficiencies  of  any  two  subjects  could  be  approximated  as  a  linear 
effect  on  the  subject's  rate  of  error  discovery.  The  relatively  good 
concordance  of  error  discovery  time  ranks  lends  some  support  to  the 
hypothesis  that  a  subject's  skill  in  debugging  is  reasonably  indepen¬ 
dent  of  the  particular  errors  left  to  discover.  In  conformance  with 
this  assumption,  each  subject's  proficiency  was  hypothesized  to  be 
expressible  in  the  form  of  a  discovery  rate  multiplier  p  where,  for 
example,  equals  the  ratio  of  subject  i's  discovery  rate  to 

subject  J's  for  an  arbitrary  error. 

A  similar  assumption  was  made  regarding  the  relative  difficul¬ 
ties  of  specific  errors.  Associated  with  each  error  J  i6  hypothesized 

a  difficulty  index  a  ,  which  equals  the  basic  discovery  rate  for  an 

J 

error;  hence,  differences  in  error  difficulty  are  assumed  to  be  con¬ 
densible  into  an  arrival  rate- which  is  independent  of  the  subject  or 
length  of  search.  Each  subject's  inherent  debugging  skill  is 


(represented  as  the  subject  discovery  rate  p^.  This  "skill  index"  is 
j assumed  to  be  independent  of  the  particular  errors  being  pursued  and. 
constant  over  the  debugging  interval.  Each  subject  is  assumed  to  be 
independent  of  the  particular  errors  being  pursued  and  constant  over 
the  debugging  interval.  Each  subject  is  assumed  to  have  spent  seme 
amount  of  initial  time  (that  was  measured  as  debugging  activity) 
reading  the  specifications  or  performing  seme  other  activity  during 
which  error  discovery  was  impossible.  The  associated  density  function 


is  given  by 


V'lvv-i)  ■  (yi* e 


Each  error  is  assumed  to  be  the  subject  of  a  search  which  is  indepen¬ 
dent  of  the  search  being  conducted  simultaneously  for  all  other  remain¬ 
ing  errors.  Furthermore,  the  activities  of  each  subject  are  assumed 
to  be  independent  of  all  other  subjects.  Hence,  one  may  assume  each 
error  discovery  time  t^  to  be  an  independent  random  variable,  and 
express  the  likelihood  of  error  discoveries  and  non-discoveries  as 

■  !*?».[  (vQJ'pi’i‘i)vV1  -  vt)1] 

f  __  v  „  -(a  p  )(t-*i  ) 

=  n  n  (a  p  )  eJilJi.ne''i  i 

ISiSM  J  1  Je^i 


which  may  be  alternatively  expressed  as 


n  T  n  (a  p  )  e  ^  Jpi^  n  e"(ajpi)(Ti1i)  .  (5.2) 

1SJSN  |.i€*(  ie^ 


For  the  formulae  above  and  all  other  discussion  of  discovery  times,  the 
(following  notations  will  apply: 


149 

•  N  is  defined  as  the  number  of  errors  in  the  program. 

M  is  the  number  of  debugging  subjects. 

•  is  defined  as  the  number  of  errors  found  during  debugging 
by  the  i'th  subject. 

•  M.  is  the  number  of  subjects  finding  the  j'th  error. 

0 

•  ty  is  the  time  to  discovery  of  error  number  j,  by  the  i’th 
subject,  where  j  is  an  arbitrary  index  over  program  errors. 

t^jj  is  the  time  to  discovery  of  the  j’th  error,  by  the 
i'th  subject,  and  is  an  order  statistic,  where  =  0. 

•  Ay  is  the  time  between  the  discoveries  of  the  j’th  and 
(j-l)'st  error,  by  the  itJl  programmer  and  Ay  =  t£(j)  - 

•  p^  is  a  location  parameter  designating  the  time  at  which  the 
i'th  subject  commenced  debugging  activity,  and  ^  =  (py  «.*,pM). 

•  aj  is  a  difficulty  index,  the  discovery  rate  associated  with 
the  j'th  error,  for  the  "average"  subject,  and  a  =  (a^, • . . ,0^) . 

•  p i  is  a  proficiency  index,  the  discovery  rate  associated  with 
the  i'th  subject  for  the  "average"  error,  and  p  =  (py 

•  £  (1, of  cardinality  Mj,  denotes  the  index  set  of 
subjects  who  discovered  error  number  j. 

•  £  (1, of  cardinality  Ny  denotes  the  index  set  of 
error  numbers  found  by  the  i^k  subject. 

I  •  T  is  the  time  interval  over  which  debugging  was  performed. 

•The  first  form  (5.l)  is  convenient  for  deriving  the  conditions  for  the 

^naximum  likelihood  estimator  vectors  p  =  ( Py • • • p^)  and  £  = 

(p^, • • • ,p^) .  The  natural  logarithm  of  the  likelihood  function  is  given 

tby 

in  L(a,p,ii)  =  E  j  E  (inp  +ina  -(a  p  )(t  -p  )  -  I  (a  p  )(T-p  )] 
ISiSM  l j€*.  1  J  J  l  lj  l  jelt  J  1  1  J 


(5-3) 


151 


rThen 


°k  = 


“k 


I  P,  t 


i  ik 


Pi 


+  z 

ISiSM 


Pi^i 


(5.6)  | 


In  the  case  that  all  errors  are  assumed  to  be  of  equal  difficulty,  then  ! 

a  =  a  for  all  J,  and  I 

J  I 


%  Map.  = 

i  i 


I  ty  .  (B  -  »,)!  -  »  Mj 


oen, 


(5-7)  j 

! 

I 

I 


where  ^  estimates  the  arrival  rates  for  each  discovery  time  distribu¬ 
tion  associated  with  the  M  subjects. 

The  main  purpose  in  hypothesizing  the  existence  of  an  indepen¬ 
dent  subject  skill  index  (p)  was  that  it  permitted  this  investigator  to 
aggregate  the  discovery  times  over  subjects  to  obtain  more  reliable 
estimates  of  distributional  parameters.  A  program  was  written  to  maxi¬ 
mize  the  likelihood  function  (5*1)  by  searching  over  the  parameter 

(k) 

space.  This  program  Jumped  from  parameter  space  point  (a,p)v  to 
point  by  a  means  of  a  steepest  ascent  algorithm  based  upon 

the  Newton-Raphson  method.  The  program  converged  relatively  quickly 

I 

and  consistently  to  solutions. 

This  program  was  run  with  two  settings  for  the  location  parame¬ 
ter:  =  t^j  (the  first  discovery  time)  and  ^  =  b^j/2.  Tbe 

choice  of  p^  =  b^^j  was  due  to  its  role  as  the  maximum  likelihood  es¬ 
timator  (MLE)  for  p^ .  This  MLE  is  always  positively  biased  however, 
land  a  more  reasonable  estimate  of  subject  starting  time  was  sought. 
lUnder  ideal  experimental  conditions,  p^  could  be  assumed  to  be  0,  as 


i 

i 

I 

I 


I 

t 

I 


I 


152 

subjects  would  presumably  commence  with  error  pursuit  Immediately. 
Conversations  with  some  of  the  subjects,  together  with  comments  on  the  j 

I 

activity  logs  and  unusually  large  first  arrival  times,  however,  made  it 
evident  that  some  nonspecific  preparatory  time  had  been  spent  by  sub-  j 

jects  during  which  the  discovery  of  errors  was  impossible.  It  was  j 

I 

arbitrarily  assumed  that  one-half  of  the  time  to  each  subject’s  first 
discovery  was  uncountable.  This  assumption  is  probably  in  error  for  I 

i 

those  subjects  with  short  first  discovery  times,  but  also  has  little  ! 

I 

effect  on  the  estimators  since  it  only  differ  from  the  true  by 
a  small  amount.  Those  subjects  with  long  first  discovery  times  may  J 
well  have  spent  longer  than  one-half  of  this  time  reading  the  specifi¬ 
cations  or  some  other  preparatory  activity.  By  not  recognizing  this, 
the  errors  eventually  found  will  have  overstated  arrival  times  and  this  ! 
will  tend  to  decrease  the  errors'  estimated  arrival  rates.  In  any 
event,  no  better  solution  seemed  evident. 

Maximum  likelihood  estimates  5  and  £  were  determined  by  the 
search  program  for  each  of  the  four  experimental  programs.  Not  sur- 

j 

prisingly,  the  discovery  indices  for  errors  (a)  corresponded  well  to  ! 
the  error  discovery  frequencies.  Similarly,  the  proficiency  indices 
(p)  were  in  general  conformance  with  the  number  of  errors  found  by 
each  subject.  For  fixed  and  known  £,  these  estimators  are  asymptoti¬ 
cally  unbiased  (as  N  increases).  Because  L(a,£,jt)  is  not  a  concave 
(function,  it  is  not  justifiable  to  use  the  inverse  matrix  of  second 
Jpartial  derivatives  (evaluated  at  a,p,£.))  as  estimates  of  the  sampling 
(covariances  among  parametric  estimators  (Bard,  197*0  •  Although  these 


;COvariances  are  expected  to  be  quite  large,  the  general  relationship  of 
; parameters  to  _one_an&ther  is  ..probably  .close  la  that 


estimates,  judging  from  the  vay  that  the  search  program  slowly  con- 

i 

verged  to  these  estimates  without  wild  fluctuations.  The  MLE's 
(a,p)  found  for  the  livelihood  function  D(a,  p,^=t^/2)  are  approxima¬ 
tions  to  one  of  an  infinite  set  of  MLE  vector  pairs.  Since  every  term  ' 

I 

in  (5*1)  involving  a  is  always  multiplied  by  some  p.,  for  any  maximum  ! 

J  ^ 

(a,p)  there  exists  a  maximum  (cCt, p/c),  where  c  is  a  scalar  constant.  i 

One  consequence  of  the  nonuniqueness  of  the  MLE  values  (a,p)  j 

is  that  they  cannot  be  compared  among  different  experimental  programs. 
Within  any  experimental  program,  however,  the  ratios  of  any  two  subject 
proficiencies  or  error  difficulties  can  be  computed  without  concern 
for  the  value  of  c.  The  ratio  of  the  smallest  estimator  value  to  each 
of  the  others  has  been  computed  for  the  MLE's  found  for  each  of  the 
four  experimental  programs  and  is  shown  in  Tables  30  through  33.  The 
range  of  difficulty  index  ratios  is  smallest  for  HAX,  the  simplest  of 
the  experimental  programs.  The  remaining  programs  have  a  much  greater 
range  of  difficulty  index  ratios,  indicating  the  presence  of  same  dis¬ 
tinctly  nontrivial  errors.  Like  the  difficulty  index  ratios,  profi¬ 
ciency  index  ratios  express  a  difference  in  error  discovery  rates. 

[The  proficiency  index  ratios  for  XTAX,  SCNK,  and  LNPR  indicate  a  rea-  | 

1 

1  ( 

sonably  small  range  of  subject  debugging  rates  with  the  median  rate 

about  one -half  the  rate  of  the  fastest  subject's,  and  twice  that  of  the1 

1 

slowest.  Debugging  proficiency  for  OPTM  has  a  much  greater  range;  the 
results  indicate  that  the  fastest  subject  debugs  approximately  20 
times  faster  than  the  slowest.  It  should  also  be  remembered  that  a 
number  of  subjects  were  not  included  in  this  analysis  for  the  lack  of 


a  sufficient  number  of  discoveries.  Hence,  it  may  well  be  the  case 


Table  30 


Table  31 


table  32 


that  proficiency  ratio  ranges  of  JO  to  1  exist  for  all  four  programs. 
This  surprisingly  large  range  in  debugging  rates  agrees  with  the  27  to 
1  range  in  debugging  performance  reported  by  Sackman,  Grant,  and 
Erickson  ( 1969)  in  an  early  study  of  this  kind. 


Test  for  Exnonential.it 


The  maximum  likelihood  estimator  values  found  for  the  model 

/i  ,  -(a.pJCt-p,) 

"  <“j  Pi>  e  ^ 

were  used  to  transform  each  t^j  to  a  variable  with  an  identical  distri¬ 
bution  for  all,  subjects  and  errors.  Using  =  t^^/2  and  the  MLE's 

^  A 

a  and  £  found  by  the  search  routines,  each  discovery  time  was  converted 
to  a  standardized  variable  z^  with  a  unit  negative  exponential  distri¬ 
bution.  This  was  accomplished  through  a  change  of  variables,  with 

t  + 

=  77  ^ 

J  i 


g(zij)  ~  f^ij^ZiJ^  d  z±j 


“Z  j  j 

The  resulting  distribution  of  z^^  is  simply  e  and  all  transformed 

discovery  times  are  identically  distributed,  as  desired.  The  frequency 
distribution  of  these  standardized  discovery  times  is  shown  in  Figure 


It  is  difficult  to  determine  the  fit  of  these  data  to  a  nega¬ 
tive  exponential  model,  as  a  great  number  of  errors  went  undiscovered 
across  the  ensemble  of  subjects.  Hence,  to  the  right  of  the  histogram 


M  MM  K 


L59 


scale  in  the  figure  reside  a  number  of  "eventual  1  discovery  times,  the 
exact  values  of  which  could  not  have  been  obtained  without  extending 
the  experimental  session  indefinitely.  However,  a  test  for  confor¬ 
mance  to  a  negative  exponential  distribution  does  exist  for  censored 
samples  such  as  this. 

The  Gnedenko  test  is  among  the  most  powerful  tests  of  exponen- 
tiality  for  censored  samples  (Monn,  Schaefer,  &  Singpurwalla,  197*0  • 
The  test  statistic,  Q,  is  derived  by  computing 


Q(r2,r2)  = 


I  (S./r  ) 
1=1  1  x 

I  (S./r  ) 


1=^+1 

where  each  is  a  weighted  interarrival  time  of  the  form 
Si  =  (n  -  i  +  -  *(!_!))  (t(0)  «  0) 


and  n  is  the  number  of  sample  points.  The  terms  and  r^  indicate  an 
arbitrary  partitioning  of  the  r  available  arrival  times,  where  n-r  data 
points  are  unavailable  (censored).  is  approximately  F- 

distributed  with  2r^  and  2rg  degrees  of  freedom.  Because  the 
r(=r^+r2)  ordered  arrival  times  can  be  split  anywhere  for  calculation 
of  Q,  an  equal  split  (r^=r2)  is  often  employed.  When  r^=r2  the  ex¬ 
pected  value  for  Q  is  1,  under  the  assumption  that  the  arrival  times 
are  identically  distributed,  negative  exponential  random  variables. 

For  each  Bet  of  standardized  discovery  times,  a  Q  value  was 
calculated.  With  samples  of  the  sizes  obtained  from  the  experiment, 
any  Q  value  less  than  2/j  or  greater  than  j/2  indicates  a  significant 


discovery  ♦imes  generated  a  Q  value  substantially  less  than  one.  This 
result  is  normally  accepted  as  evidence  of  a  decreasing  hazard  rate 
(the  instantaneous  discovery  rate).  Even  after  normalizing  each 
error's  discovery  time  by  its  difficulty  index,  it  appears  that  error 
discoveries  become  decreasingly  probable  as  time  goes  on.  Further 
speculations  regarding  the  cause  of  this  phenomenon  are  given  in  the 
conclusions. 


Software  Reliability  Models 

The  simplest  and  most-used  model  of  software  failure  behavior 
is  that  proposed  by  Jelinski  and  Moranda  (1973)  in  which  software 
failure  interarrival  times  are  assumed  to  conform  to  a  stepwise  de¬ 
creasing  negative  exponential  distribution.  This  was  shown  in  Chapter 
III  to  be  equivalent  to  a  model  in  vhich  all  program  errors  are  iden¬ 
tically  distributed,  negative  exponential  random  variables  with  common 
mean  time  to  failure  and  arrival  rate.  The  preceding  analyses  in  this 
chapter  have  shown  the  error  arrival  rates  to  be  significantly  differ¬ 
ent  within  a  program,  even  after  accounting  for  subject  differences. 
Phis  finding  is  in  agreement  with  Littlewood  (1979)  who  demonstrated 
that  the  Jelinski-Moranda  model  underestimated  the  mean  time  to  failure 
Ln  many  cases.  Littlewood  also  showed  that  the  Jelinski-Moranda  model 
lictated  a  distribution  of  number  of  errors  remaining  (by  time  t) 


which  was  insufficiently  skewed  in  the  right  tail  when  compared  to 
real  data  obtained  from  large  programming  projects. 


These  observations  by  Littlevood  apply  equally  veil  to  the 
discovery  times  obtained  from  this  experiment.  The  maximum  likelihood 
estimates  for  difficulty  indices  varied  substantially  from  error  to 
error.  If  errors  have  a  similar  range  of  difficulty  in  all  programs, 
one  vould  expect  the  number  of  errors  found  by  time  t  distribution  to 

! 

be  significantly  more  skewed  than  if  sill  errors  were  equally  difficult. 
Moreover,  even  when  error  discovery  times  were  standardized  by  the 
differing  difficulty  indices,  the  Gnedenko  test  indicated  (signifi¬ 
cantly)  the  effect  of  a  decreasing  discovery  rate  over  time.  This 
finding  further  demonstrated  the  failings  of  the  Jelinski -Moran da 
model  for  the  experimental  data  obtained  in  this  study. 

To  demonstrate  the  inaccuracy  of  the  stepwise  decreasing  ex¬ 
ponential  model,  the  discovery  times  for  all  subjects  were  aggregated 
(for  each  program)  by  computing  discovery  rates  as  shown  in  equation 
(5-7)  In  the  preceding  section.  These  discovery  rates  were  used  to 
create  standardized  discovery -time  variables 


Zij  ^ij  "  Mi^i 

which  normalized  for  the  effect  of  different  subject  proficiencies  and 
starting  times.  In  place  of  N  in  (5-7)#  a  maximum  likelihood  estimate 


N  was  used. 


The  development  for  these  MLE's  is  straightforward.  Each  sub¬ 
ject  found  n^  errors  and  failed  to  discover  N-n.^  at  a  rate  =  ap^ 
where  a  is  the  comnon  error  difficulty  index  and  p ^  the  assumed  subject 
|proficiency  index.  Discovery  times  t  are  reduced  by  p  equalling 

j  1 J  1 

■one-half  the  first  discovery  time  for  each  subject.  Then  the  likeli- 


hood  of  any  one  subject’s  particular  error  discoveries  (including  the 
order  in  which  they  were  found)  is  given  by 

n  A,  e  1  1J  .  (e  AiA)  1 

je*. 


Errors  are  assumed  to  be  indistinguishable,  and  so  the  joint  likelihood  : 

function  for  the  ordered  sample,  including  all  subjects  is  given  by  ! 

i 

l  -  l-SU  n  fj»i  n  ■  e-xiT(»-Ki,1 

\  (N-n) !  /  isigJ  1  jei^ 

where  n  =  Z  N.  . 

ISiSM  1 

The  MLE  for  each  subject's  discovery  rate  i6  derived  as 

in  L  =  in  Nl  -  in(N-n)!  +  Z  f  N.inV  -  Z  A,  t  -  X  T(N-N.)1 

lSiSM  L  Jei  J  1  J 

and  ^  =  ^  -  Z  t  -  T(N-K  ) 

d\  \  j  e*k 


jimplying 


Z  t  +  T(N-N  ) 
* 


(An  estimator  N  for  total  program  errors  is  determined  by  calculating 


OSiSk-1  (H-i) 


-  T  Z  K  =0 


and  finding  the  root  N  of  this  equation  in  N.  The  estimates  N  were 
.computed  for  each  of  the  four  experimental  programs.  The  true  values 


(Z/w*+  n  (z(i)4  a)' 


166 


For  each  experimental  program  the  ordered  observation  vector. 
was  composed  of  all  standardized  discovery  times  recorded  for  al 3  sub- 

I 

jects  in  the  reduced  discovery  time  set  (6ee  Tables  l4  through  17).  A  j 
standardized  time  for  the  i'th  subject  and  error  J  was  calculated  as 


U 


■  si(tu  -  ^ 


by  using  the  sets  of  maximum  likelihood  estimates  p^  and  given  in 
Tables  29  to  32.  It  was  assumed  that  this  normalization  would  adjust 
for  individual  subject’s  starting  times  and  discovery  rates,  and  yield 
standardized  times  for  aggregation .  The  set  of  times  (*. .)  was  then 
ordered  to  produce  the  order  statistic  vector  Z^j. 

A  program  was  written  to  conduct  a  Newton-Raphson  search  over 
the  parameter  space  ((a,p,n))  for  constant  values  given  by  and  k, 

and  with  the  objective  function 

In  L(a,P,n)  =  /n(n!)  -  ln(n-k)J  +  np  in(ct)  +  k  /n(p)  - 


p(n-k)in(Z(k)  +  a)  -  (p+l)(Z^j  +  a)  . 

i 

As  can  be  seen  from  the  log-likelihood  function,  for  fixed  n,  increases  j 
in  a  could  be  paired  with  decreases  in  P  and  result  in  similar  objective 
function  values.  Unlike  the  ($, p)  MLE  search  for  equation  (5.1),  the 
reciprocal  nature  of  (a,p)  pairs  resulted  in  prolonged  searches  over 
the  {(a,p,n)}  space.  This  iterative  search  yielded  log-likelihood 
values  differing  by  a  fraction  of  a  percent  for  vastly  differing  MLE 
triples.  ! 

I  | 

I 


The  optimal  MLE  values  found  by  the  search  program  were  uni- 
fonnally  discouraging.  In  every  case,  the  MLE  n  coverged  upon  a 
value  slightly  greater  than  k,  the  number  of  known  error  discoveries 
as  shown  in  Table  35* 

Table  35 

Comparison  of  True  and  Estimated  Error  Population  Size 


Program 


ITAX 


True 

n 

Estimated 

* 

n 

96 

52 

99 

56 

96 

77 

170 

76 

The  search  program  was  re-executed  to  search  over  l(o:,P))  fixing  n  at 
the  true  value  for  each  experimental  program;  the  resulting  log- 
likelihood  values  differed  very  little  from  the  optimal  values. 

This  evidence  was  sufficient  to  convince  us  that  the  Gemma- 
Exponential  model  was  ill-suited  for  describing  discovery  time  behavior 
Like  Littlewood  (1978)  we  could  have  used  successive  MLE’s  to  determine 
if  successive  ordered  discovery  times  produced  uniformly  distributed 
jfractiles  when  applied  to  the  next-discovery-time  distribution.  This 


168 

was  not  done  for  tvo  reasons.  First,  the  (unconditional)  discovery 
time  distribution  vas  proved  non-exponential  by  the  Gnedenko  test  (in 
contradiction  to  our  model),  and  the  gamma  distribution  for  difficulty 
indices  was  primarily  chosen  for  its  flexibility  and  mixing  compatibil¬ 
ity  with  the  exponential  (as  opposed  to  having  chosen  it  for  seme  form 
of  representativeness).  Second,  the  proper  estimation  of  error  popula¬ 
tion  size  seems  to  us  an  important  property  of  a  software  reliability 


distribution,  and  this  property  should  be  a  necessary  property  of  an 
adopted  model. 


CHAPTER  VI 


FINDINGS,  CONCLUSIONS,  AND  SUGGESTIONS 
FOR  FURTHER  RESEARCH 

Findings 

An  exploratory  study  was  conducted  to  determine  what  personal, 
environmental,  and  program-related  factors  affect  the  process  of  dis¬ 
covering  errors  in  computer  software.  In  a  controlled  setting,  a 
group  of  software  professionals  searched  for  errors  in  programs  whose 
error  composition  was  known,  and  which  had  been  developed  to  represent 
various  degrees  of  computational  content  and  logical  complexity.  The 
subject  group  exhibited  a  wide  range  of  ages,  experience,  and  educa¬ 
tion,  whereas  less  diversity  existed  among  subjects’  self-assessment 
and  prior  exposure  to  the  program  applications.  Subject  performance 
was  measured  by  the  number  of  errors  each  found  and  this  performance 
measure  was  positively  correlated  with  both  general  and  computer 
science-related  education.  Other  analyses  indicated  that  the  most 
Important  factor  in  performance  was  the  amount  of  recent  programming 
experience  held  by  a  subject.  Subject  performance  also  differed  from 
program  to  program,  and  an  analysis  of  variance  led  to  the  singling 
out  of  logical  complexity  as  the  most  significant  factor,  with  the 
degree  of  computational  content  a  distant  (but  significant)  second 
and  test  data  type  exhibiting  no  significant  effect  on  subject  perfor¬ 
mance.  A  significant  negative  interaction  effect  between  logical 


170 

complexity  and  computational  content  signified  deficiency  in  the  mea-  ! 
surement  of  debugging  performance.  j 

An  analysis  of  the  errors  found  by  subjects  indicated  that  | 
logical  and  data-handling  errors  were  more  difficult  than  computational! 
errors,  and  that  the  total  set  of  errors  represented  a  vide  range  of 
associated  difficulties.  There  existed  a  similarly  wide  range  in 

j 

subject  proficiencies.  Maximum  likelihood  estimates  of  error  diffi-  j 

) 

culties  and  subject  proficiencies  were  obtained  and  a  test  for  expo- 
nentiality  of  normalized  discovery  times  indicated  a  significant 
departure  from  exponent iality,  due  to  the  presence  of  a  decreasing 
instantaneous  discovery  rate  over  time.  To  this  decreasing  discovery 
rate  can  be  attributed  the  subsequent  overestimation  of  error  popula¬ 
tion  size  by  the  Jelinski-Moranda  and  our  proposed  software  reliability 
models. 

Problem  Background 

The  program  error  has  served  this  study  as  the  single  subject 
of  research  and  experimentation.  Yet,  ten  years  ago  this  6tudy  would 
have  been  impossible  to  conduct  properly,  because  of  the  crucial  lack 
of  a  conceptual  framework  with  which  view  the  programming  process.  j 

During  the  first  thirty  years  of  the  computer  era,  the  program  error  i 

I 

was  often  viewed  as  some  form  of  retribution  to  those  who  were  not  j 

1 

properly  trained  or  sufficiently  creative  software  artists.  During 

the  1970s,  however,  the  program  error  has  risen  in  stature.  Methodo-  1 

1 

1 

logies  have  evolved  that  would  help  limit  its  numbers,  pursue  its 


incarnations  and  estimate  the  number  of  survivors. 


171 

ThiE  interest  in  error  prevention,  detection,  and  correction 
has  evidenced  itself  in  functional  computer  research  areas:  Operating 
systems  must  not  only  allocate  resources  but  also  protect  the  entire 
software  system  from  the  damage  that  may  result  from  errors  hiding  in 
one  system  element;  programming  languages  must  now  be  more  than  com¬ 
prehensive,  powerful,  and  efficient — they  must  help  express  intent 
and  facilitate  proofs  of  program  correctness;  hardware  must  exhibit 
fault-tolerance  as  veil  as  fault  detection. 

Alongside  the  growth  of  these  traditional  areas  of  computer 
study  have  evolved  disciplines  addressing  directly  the  problems  of 
software  "mi6-construction. "  These  areas  may  be  loosely  aggregated 
as  the  field  of  validation  research,  and  further  bifurcated  into  the 
realms  of  verification  and  testing.  Verification  research  is  largely 
a  product  of  the  computer  science  community  and  clearly  the  more 
"optimistic"  of  the  two  approaches.  The  most  important  goal  of  veri¬ 
fication  research  is  the  development  of  conceptual  and  material  tools 
with  which  to  prove  program  correctness  before,  during,  vnd  after  its 
development.  The  long-range  aim  of  verification  research  is  the  de¬ 
velopment  of  a  language  with  which  to  express  problems  and  the  identi¬ 
fication  of  correctness-preserving  transformations  that  guarantee  the 
absence  of  errors.  Until  that  end  is  achieved  there  is  the  clear  need 
for  methodologies  which  aid  in  error  discovery. 

These  methodologies  include  the  area  of  software  testing  which 
fit:. i-2  in '  re  a  Kf-d  acceptance  as  an  important  research  area  as  well 
.  •  .  The  questions  addressed  within  testing 


jxl  complete  test  sets  and  the 


172 


procedures  for  generating  flovchart-covering  input  data.  Of  primary 
concern  is  the  notion  of  test  criterion  reliability:  the  demonstration 
that  a  procedure  for  generating  test  data  will  find  any  errors  in  a 

I 

program.  j 

The  most  immediate  and  obvious  response  by  the  computing  com-  ! 
munity  to  the  crisis  in  software  reliability  has  been  through  the  ex¬ 
pansion  and  redefinition  of  the  testing  phase  of  software  development,  j 

I 

Most  of  what  is  now  testing  was  once  considered  an  integral  part  of  j 

i 

program  writing — a  part  that  should  not  have  been  necessary,  but  (some¬ 
how)  always  was.  An  enlightened  and  mature  software  development  com¬ 
munity  is  now  beginning  to  consider  testing  in  the  same  positive  way 
that  product  assurance  is  conducted  in  other  manufacturing  processes. 

The  painful  experiences  of  the  past  have  forced  software  professionals 
to  develop  testing  innovations  like  automated  test  tools,  requirements- 
based  testing  procedures,  and  independent  testing  departments. 

Between  the  rigorous  expositions  of  formal  testing  theorists 
and  the  published  recommendation  of  testing  practitioners  lies  the  | 

body  of  literature  regarded  as  Software  Engineering.  The  title  of 
software  engineer  Indicates  an  expression  of  professionalism  and  I 

acceptance  of  responsibility  beyond  that  of  the  "sensitive  software  , 
artist."  Like  other  engineering  disciplines,  the  software  engineer  1 

j 

seeks  to  develop  methods  and  procedures  which  bring  precision  into  the  j 

i 

construction  of  software  architectures. 

I 

The  foremost  interest  among  Software  Engineers  is  Reliable 
Software.  It  is  the  theme  of  conferences  and  the  topic  of  journals. 


Having  reliable  software  as  a  goal  is  the  admission  of  the  inevit¬ 
ability  of  systemic  imperfection.  A  Reliable  Software  product  (as 
opposed  to  a  correct  one)  is  a  system  which  performs  what  is  expected 
of  it  in  a  satisfactory  manner  over  a  reasonable  period  of  time.  The 
reliable  system  is  attained  by  gathering  information  about  its  behavior 
and  correcting  deviations  in  this  behavior  from  that  intended.  Viewed 
in  this  fashion,  the  minimization  of  errors  becomes  the  dual  of  the 
problem  of  maximizing  reliability. 

Software  Reliability  Theorists  model  the  attainment  of  program 
reliability  as  a  process  obeying  general  principles,  irrespective  of 
the  software  being  tested,  the  testing  approach  used,  or  the  individ¬ 
uals  involved.  The  search  for  errors  in  the  time  domain  is  viewed  as 
a  stochastic  process  with  error  discovery  as  the  significant  random 
event.  The  overriding  motivation  for  this  viewpoint  is  prediction. 

If  the  reliability  of  a  software  system  is  a  function  of  time,  then 
the  determination  of  that  function  permits  the  software  engineer  to 
predict  the  time  necessary  to  achieve  a  given  level  of  reliability, 
the  estimated  number  of  remaining  errors,  and  other  indicators  of 
reliability  growth.  In  the  real  world  of  software  development,  these 
predictions  must  be  made  anyway,  since  budgets  must  be  6et  and  manpower 
planned.  Whether  software  reliability  theory  is  yet  mature  enough  to 
merit  the  attention  of  software  developers  is  the  subject  of  debate. 

The  fact  remains,  however,  that  in  the  absence  of  models  of  program 
behavior  and  reliability  growth,  the  software  developer  is  left  only 
with  intuition  and  speculation. 


Experimental  Approach 

This  study  vas  initiated  so  that  a  better  understanding  of 
software  validation  might  be  gained  through  the  comparison  of  current 
approaches  and  the  analysis  of  experimental  results.  It  was  apparent 
from  discussions  with  leading  practitioners,  and  researchers  in  the 
field,  that  at  least  two  distinct  approaches  were  available  to  the 
researcher  seeking  to  test  models  of  the  error  discovery  process.  The 
deterministic  view  is  held  by  most  computer  scientists  and  practioners. 
The  success  of  a  software  development  project  is  seen  as  a  function  of 
the  development  environment  and  the  nature  of  the  project  in  question. 
This  approach  to  understanding  the  "software  problem"  focuses  upon  the 
influence  of  tools,  personnel,  and  test  methods  upon  the  quality  of 
the  software  product.  The  second  approach  is  tied  to  the  stochastic 
model  of  software  reliability  theorists  in  which  any  software  good  is 
viewed  as  a  belonging  to  a  group  whose  members  conform  to  approximate 
model  of  aggregate  behavior.  Both  of  these  approaches  are  important 
in  the  same  way  that  micro-  and  macroeconomics  participate  in  describ¬ 
ing  economic  behavior. 

An  experiment  was  designed  so  that  data  could  be  collected 
and  analyzed  using  both  approaches  for  understanding  program  errors 
and  their  discovery.  Frcm  the  many  variables  differentiating  software 
testing  situations,  three  "objects"  were  chosen  as  the  most  important 
for  analysis.  First,  it  was  decided  that  the  variability  among  people 


I  may  account  for  many  of  the  differences  in  software  testing  success. 

I 

| A  group  of  22  subjects  was  selected  to  participate  in  a  debugging 


'5 


experiment  in  which  some  relationships  might  be  discovered  between 
subject  background  and  characteristics,  and  aspects  of  their  perfor¬ 
mance.  These  subjects  were  professionals  in  the  computing  field,  and 

intentionally  chosen  to  represent  a  variety  of  experiential  and  aca-  i 

i 

demic  backgrounds.  j 

j 

The  second  source  of  experimental  variability  was  the  set  of 
programs  given  to  the  subjects  for  debugging.  Degree  of  logical  com¬ 
plexity  (LC)  and  computational  content  (CC)  were  chosen  as  the  two 
program  characteristics  that  might  have  the  greatest  effect  on  one's 
ability  to  find  errors.  An  objective  measure  of  each  characteristic 
was  decided  upon  and  one  program  was  written  to  represent  each  of  the 
f  our  combinations  of  factor  settings:  low  LC,  low  CC  (rEAX);  low  LC, 
high  CC  (OPEM)j  high  LC,  low  CC  (SCNR)j  high  LC,  high  CC  (LNPR).  Each 
program  was  thoroughly  debugged  and  each  error  found  was  documented, 
then  categorized.  These  errors  were  reinserted  into  the  experimental 
programs  which  were  then  given  to  the  subjects  for  testing  and  error 
correction. 

The  third  controlled  experimental  factor  was  the  type  of  input 
data  given  to  each  subject  for  the  purpose  of  generating  test  results,  j 
One  type  of  test  data  was  generated  by  "black-box"  methods — the  devel-  J 
opment  of  data  which  checks  the  conformance  of  program  processing  to 

i 

the  specifications.  In  addition  to  these  black-box  data,  each  pro¬ 
gram's  control  graph  was  determined  and,  from  an  anlysis  of  execution  ! 
paths,  sets  of  "white-box"  data  sets  were  developed.  These  two  test 
|  generation  procedures  correspond  to  the  two  extremes  between  which 


resides  the  proper  balanced  approach  of  combining  functional  testing 
with  testing  based  upon  program  structure. 


Conclusions 

Although  many  conclusions  can  be  drawn  from  the  relationships 
made  apparent  during  this  research,  the  evidence  supporting  these  con¬ 
clusions  was  as  much  the  result  of  ongoing  subjective  analysis  as  it 
was  the  explicit  testing  of  preformulated  hypotheses.  As  an  explora¬ 
tory  study  such  things,  for  example,  as  the  relationship  between 
program  complexity  and  failure  time  distributions  were  completely  ob¬ 
scure  before  conducting  the  experiment  and  by  no  means  transparent 
now.  We  will  attempt  to  segregate  those  conclusions  drawn  from  objec¬ 
tive  analysis  from  those  whose  underpinnings  are  based  more  upon 
opinion.  It  is  hoped,  in  this  way,  that  future  researchers  may  con¬ 
centrate  their  energies  in  the  retest  and  interpretation  of  the  latter 
realm  of  observations. 

The  novelty  in  this  study  rests  in  the  choice  of  true  repli¬ 
cates  for  the  observations  of  debugging  performance.  To  our  knowledge; 
only  Howden  (1978)  and  Hetzel  (1975)  have  employed  identical  programs 
as  the  instrument  stimulating  observations  which  can  be  compared. 
Howden,  however,  was  the  sole  subject  of  his  own  experiment,  employing 
various  test  criteria  (as  mechanically  as  possible)  on  a  series  of 
programs  to  Judge  the  efficacy  of  these  criteria.  Hetzel 's  experiment 
was  most  similar  to  ours,  and  in  fact  serves  as  the  model  for  many  of 
jthe  procedures  which  we  adopted  and  some  which  we  avoided.  Hetzel 

i 

I 

fwrote  three  programs,  reinserted  the  errors,  and  had  a  group  of  sub¬ 
jects  conduct  _a  timed  scare},  i'or  evidence  of  these  errors.  __  _ 


177 


This  evidence  could  be  "conclusive"  that  is,  the  discovery  of  improper 
program  statements,  or  "indirect" — the  observation  of  improper  output. 
Moreover,  Hetzel's  subjects  were  not  permitted  to  modify  the  experi¬ 
mental  programs,  and  were  timed  indiscriminately  during  1  phases  of 
program  debugging.  Lastly,  Hetzel's  subject  sample  was  reasonably 
homogeneous — young  programmers  and  graduate  students  with  backgrounds 
in  the  computer  sciences. 

We  endeavored  to  inject  some  further  precision  into  this  format, 
while  broadening  the  range  of  experimental  objects  so  that  the  results 
could  be  more  generally  applied.  The  added  precision  resulted  from  a 
conscious,  objective  search  for  programs  which  differed  from  one  another 
in  measurable  ways,  and  the  partitioning  of  debugging  activity  into  the 
subactivities  of  code/re suits  review,  error  correction,  and  test  data 
construction.  Many  of  the  subjects  complained  good  naturedly  that  the 
experiment  had  seemed  like  going  to  work,  except  that  it  had  been  a 
Longer  day.  This  was  exactly  the  result  that  was  desired.  The  sub¬ 
jects  could  work  at  their  own  pace  with  few  constraints  on  the  manner 
with  which  they  normally  found  and  corrected  errors. 

The  subject  set  was  remarkably  diverse  in  background  .  The  j 

I 

range  of  organizations  from  which  the  subjects  were  recruited  and  the  j 

I 

subject  selection  criteria  were  formulated  to  help  ensure  a  broad 
Bample.  The  older  subjects  typically  had  less  training  in  computer 
studies  and  many  had  acquired  their  knowledge  and  proficiencies  on  the 
Job.  The  younger  subjects  tended  to  have  more  foimal  education  in  I 

and  a  better  familiarity  with  some  of  the  algorithmic  models 
Employed  in  SCNR  and  LNPR. 


Computing 


178 


The  most  obvious  question  of  interest  regarding  the  outcomes 
is  "Who  found  the  most  errors  and  why?"  Five  of  the  top  seven  per¬ 
formers  were  in  their  late  twenties,  all  but  one  had  done  master's 
work  in  a  tech-  il  field,  and  all  had  four  or  more  continual  years  of  j 

recent  programming  experience.  We  conclude  that  to  maintain  profi-  j 

! 

ciency  in  debugging  it  is  necessary  to  continue  its  practice.  In  it-  j 
self,  this  is  not  surprising,  as  atrophy  of  other  technical  skills  is  j 
a  well-known  phenomenon.  It  is  surprising  however  that  ceteris  parabis  j 
this  aspect  was  more  important  than  familiarity  with  the  problem,  or 
formal  training  in  computer  studies.  Prior  training  in  a  technical 
area  was  also  predominant  among  the  top  performers.  It  is,  however, 
difficult  to  draw  conclusions  from  this,  however,  as  it  is  impossible 
to  determine  if  these  academic  fields  simply  have  appeal  to  those  with 
programming  aptitude  or  engender  some  reasoning  powers  in  the  student 
that  later  evidences  itself  in  programming  proficiency. 

Subject  age,  and  variables  positively  correlated  with  it,  fre-  I 

i 

quently  entered  the  regression  analysis  on  performance  and  nearly  j 

always  with  a  negative  coefficient.  The  simple  correlation  coefficient  i 

of  age  and  number  of  errors  found  was  highly  negative  and  only  three  | 

| 

of  the  nine  subjects  over  thirty  found  more  than  the  median  number  of 
errors.  Nevertheless,  we  do  not  believe  age  in  itself  to  be  an  impor- 

I 

tant  factor  in  predicting  debugging  performance.  Many  of  the  older  : 
subjects  had  only  moderate  recent  experience  in  day-to-day  programming,  i 
and  had  academic  training  that  either  excluded  computer  studies  al-  ; 

I 

jtogether,  or  consisted  of  computer  classes  of  questionable  merit  in 
Ithe  distant  past. 


179 

The  quality  of  experience  also  appeared  to  have  an  influence 
on  debugging  performance.  Those  subjects  whose  nonnal  programming 
involved  system  software  were  uniformly  above  the  median.  Conversely, 
subjects  whose  programming  duties  revolved  around  normal  data  process¬ 
ing  (the  applications  of  conventional  businesses)  found  far  fewer 
errors,  and  the  more  conventional  were  their  day-to-day  programming 
tasks  (e.g.,  programming  and  maintenance  of  accounting  systems),  the 

I 

worse  was  their  performance.  One  reason  for  this  finding  may  be  the 
rather  esoteric  nature  of  three  of  the  experimental  programs,  LNFR, 

SCRN,  and  OPTM.  However,  many  of  the  highest  performing  subjects 
expressed  no  prior  familiarity  with  one  or  more  of  the  applications. 
Moreover,  the  five  best  perfonners  overall  also  scored  highest  on 
ITAX,  a  reasonably  conventional  application. 

Educational  background  was  measured  in  a  variety  of  ways, 
including  highest  degree  attained  and  number  of  in  its  earned.  In 
terms  of  simple  correlations  with  total  performance,  it  appeared  that 
mere  schooling  led  to  a  greater  number  of  errors  found.  In  many  of  the 
regression  analyses,  however,  the  only  educational  variable  to  enter 
with  a  positive  coefficient  was  highest  degree  earned.  Because  many  ; 

I 

t 

of  the  subjects  were  still  going  to  school,  educational  attainment  i 

jmeasures  were  highly  confounded  with  age  and  experience  measures,  j 

as  well  as  with  one  another.  Moreover,  the  quality  of  the  educational 
experiences  measured  was  decidedly  varied  within  the  sample.  Aggre¬ 
gated  together  were  part-time  and  full-time  enrollments,  Ivy  League  < 


(and  training  school  educations,  a  mid-sixties  art  major,  and  a  late- 
iseventies  computer  science  graduate.  Any  further  research  attempting 


180 


to  relate  education  to  programming  performance  should  consider  that  I 
even  the  five  educational  measures  employed  in  this  study  vere  not  suf¬ 
ficiently  discriminating. 

Both  experience  in  general  and  experience  in  programming 
exhibited  mildly  negative  correlations  with  error  discovery,  although 
neither  variable  ever  entered  the  regression  analysis  for  any  of  the 
experimental  programs  or  for  total  performance.  Because  a  moderate 

i 

amount  of  recent  experience  seemed  to  be  a  deciding  factor  in  subject  j 

i 

performance,  the  young  subjects  with  little  experience  and  older  sub-  < 
Jects  with  much  past  experience  effected  insignificant  regression 
results  by  the  confounding  of  age  and  experience.  It  should  be 
pointed  out,  however,  that  the  top  five  performers  each  had  profes¬ 
sional  experience  in  programming  that  was  well  above  the  median.  J 

Familiarity  with  the  environmental  aspects  of  the  experiment 
was  measured  by  asking  each  subject  his/her  past  experience  with  the  j 

I 

BASIC  language  and  interactive  systems.  The  response  to  these  ques-  | 
tions  appeared  to  have  no  bearing  on  subject  performance.  This  is  not  j 

j 

particularly  surprising  as  the  BASIC  subset  and  interactive  system  | 

I 

chosen  were  relatively  unsophisticated  and  easy  to  use.  There  appar  1 
ently  exists  no  tertiary  relationships  which  relate  normal  programming  j 
environment  of  the  subject  and  performance  in  this  experiment.  One  i 

I 

reason  for  this  may  be  that  the  subject  had  the  opportunity  to  debug 
in  a  manner  with  which  he/she  felt  most  comfortable.  Some  subjects 
U6ed  the  computing  system  in  a  "batch-like"  fashion;  studying  the 
listing  for  sane  time  while  accumulating  and  recording  errors  found, 

i 

!  then  making  corrections  en  masse  to  produce  a  new  run  and  listing. 


Others  employed  vhat  one  subject  termed  the  "run-and-gun"  method  only 
practicable  on  interactive  systems.  This  approach  entailed  frequent 
interactions  with  the  computing  system  by  applying  program  changes  as 
soon  as  possible,  some  exploratory,  6ome  corrective.  This  difference 
may  reduce  to  the  differences  in  the  subjects'  need  for  stimuli, 
attention  span,  or  discipline,  and  vould  serve  as  the  basis  of  an 
interesting  study  of  its  own. 

Prior  to  the  experiment  each  subject  vas  asked  to  evaluate 
himself  /herself  in  12  programming-related  areas.  The  subjects  were 
asked  to  be  as  objective  as  possible  and  were  given  no  reference  with 
which  to  make  comparison .  From  the  repeated  assurances  that  were  de¬ 
manded  by  the  subject  for  anonymity,  it  was  felt  that  these  assessments 
were  reasonably  honest  evaluations  on  the  part  of  the  subjects.  A 
6imple  sum  of  ranks  was  eanputed  for  each  subject  for  an  analysis  that 
led  to  surprising  results.  The  sample  range  for  these  total  self- 
assessment  scores  was  very  small,  and  a  simple  correlation  with  total 
performance  was  nearly  zero.  The  top  six  performers'  total  self- 
assessment  scores  were  sprinkled  uniformly  throughout  the  range. 

Individual  self-assessiosnt  categories  exhibited  slightly 
stronger  relationships  to  performance,  with  "knowledge  of  computer 
science  principles"  and  "knowledge  of  algorithms"  showing  significant 
positive  correlations  with  error  discovery.  In  the  regression  analy¬ 
ses,  only  "knowledge  of  computer  science  principles"  and  "knowledge  of 
data  processing  principles"  entered  the  regression  equation  more  than 
I  once,  the  former  with  positive  coeff icievts,  the  latter  with  negative 
i coefficients.  This  appears  reasonable  as  many  of  the  top  performers 


182 


had  a  more  technical  background,  "whereas  many  of  the  poor  performers 
vere  engaged  In  more  mundane  data  processing  vork. 

We  find  the  lack  of  discriminating  pover  of  these  self- 
assessments  to  be  quite  remarkable.  When  given  no  reference  group  with 
which  to  compare  themselves,  subjects  on  the  "whole  tended  to  rate  them¬ 
selves  uniformly  above  average.  For  insight  into  the  reason  for  these 
homogeneous  assessments,  one  need  only  study  one  of  the  self-assessment 
categories.  In  the  mathematics  category,  for  example,  some  subjects 
with  sound  technical  backgrounds  and  seme  with  two  years  of  business 
school  education  rated  themselves  as  average.  A  similarly  mixed  set 
of  individuals  rated  themselves  as  strong,  including  a  system  program¬ 
mer  with  a  master's  degree  in  mathematics  and  a  systems  analyst  with  a 
degree  in  public  administration.  One  explanation  for  these  results  is 
that  one's  standards  of  excellence  increases  as  one  acquired  education 
and  experience.  Another  explanation  may  be  that  an  unreferenced  self- 
assessment,  like  that  in  this  study,  tends  to  measure  self-confidence, 
a  trait  that  is  not  necessarily  correlated  with  education,  experience, 
or  proficiency. 

In  contrast  to  the  self-assessment  categories,  questions  re-  j 

I 

garding  subjects'  prior  experience  with  problems  similar  to  the  experi- j 
mental  programs  appeared  more  closely  related  to  performance.  Response^ 
to  these  familiarity  categories,  however,  were  not  necessarily  well 
correlated  with  the  programs  indicated  in  the  questions,  with  the  ex¬ 
ception  of  SCNR.  Instead,  these  indications  of  prior  exposure  to  the 
theory  and  application  of  each  experimental  program  were,  as  a  group, 
related  to  all  performance  measures.  It  would  appear,  therefore,  that 


18?  | 

questions  regarding  prior  exposure  are  more  discriminating  than  gener-  | 

I 

alized  self -assessment,  and  decidedly  important  in  all  debugging  en¬ 
vironments. 

Notwithstanding  the  reasonably  small  size  of  the  experimental 
programs,  it  is  felt  that  conclusions  regarding  debugging  performance 
can  be  generalized  to  the  components  of  medium- and  large-scale  soft¬ 
ware.  The  composition  of  error  classes  within  the  experimental  programs 
was  considerably  similar  to  those  reported  in  other  studies  of  medium- 
and  large-scale  software  (see  Chapter  II).  Subject  success  in  finding 
errors  of  each  class  was  also  in  conformance  with  that  commonly  pro¬ 
posed  in  the  literature.  The  most  interesting  result  of  this  section 
of  the  error  analysis  was  the  evidence  that  error  discovery  is  closely 
related  to  the  visibility  of  program  misconstruction.  This  was  no 
doubt  in  part  due  to  the  availability  of  explicit  specifications  with 
which  subjects  could  "pattern-match"  the  code.  Further  research  along 
these  lines  could,  however,  provide  a  sounder  basis  for  explaining  why 
certain  error  types  are  more  readily  found  than  others,  which  may,  in 
turn,  lead  to  beneficial  influences  on  future  language  design  and  test  j 

I 

tools. 

An  analysis  of  variance  was  performed  to  determine  the  effects 
of  logical  complexity,  computational  content,  and  test  data  type  upon 
jdebugging  performance.  The  overwhelming  effect  of  logical  complexity 
|and  moderate  effect  of  computational  content  upon  error  discovery  vs s 
[predictable  and  readily  observable  by  studying  the  raw  data.  The  sig¬ 
nificantly  negative  interaction  effect  of  logical  complexity  and  com¬ 
putational  content  was  the  most  interesting  finding  of  this  analysis, 


184 


for  it  signified  a  failure  in  measurement,  or  linearity  assumptions, 
or  both. 

The  reader  may  recall  that  logical  complexity  and  computational 
content  vere  measured  objectively  by  the  application  of  formulae  to 
the  programs'  code  (see  Appendix  B) .  Possible  inaccuracies  in  these 
metrics  led  us  to  dichotomize  the  measure  range  into  high  and  low 
regions,  by  which  each  factor  setting  was  determined.  The  logical  com¬ 
plexity  metric  value  for  SCNR  (l4.8)  was,  however,  quite  a  bit  larger 
than  that  for  LNPR  (10. 7),  as  was  the  computational  content  value  for 
OFTM  (84$)  was  greater  than  that  of  LNPR  (74$) .  A  more  accurate 
assessment  of  these  factors'  effects  may  have  been  obtained  by  a  mul¬ 
tiple  regression  analysis  employing  metric  values  directly  and  a  dummy 
variable  for  test  data  type,  in  place  of  our  analysis  of  variance. 

The  criterion  variable  for  the  analysis  of  variance  was  the 
mean  percentage  of  errors  found  per  hour  (MPEH).  The  comparability  of 
MPEH  among  differing  programs  is  suspect,  as  it  is  not  clear  that  di¬ 
viding  the  number  of  program  errors  found  by  the  number  of  errors 
resident  and  by  the  debugging  duration  leads  to  comparably  normalized 
measures  of  performance .  A  better  (and  more  costly)  experiment  could 
be  devised  wherein  the  same  number  of  errors  would  be  reinserted  into 
jthe  experimental  programs,  and  the  same  length  of  time  would  be  given 
for  debugging.  Any  nonlinear  effects  of  duration  and  error  population 

sizes  could  be  mitigated,  and  linear  models  of  factor  effects  could  be 

1 

imore  confidently  applied. 

The  most  basic  result  shown  to  be  of  overwhelming  generality 
was  the  inequality  of  expected  error  discovery  times.  It  has  always 


9  ^ 

AD-A110  366  UNIVERSITY  OF  SOUTHERN  CALIFORNIA  LOS  ANGELES  DEPT  0— ETC  F/G  9/2 

A  STUDY  OF  FACTORS  AFFECTING  SOFTWARE  TESTING  PERFORMANCE  AND  C~ ETC(U) 

JUN  80  J  L  BAHR 

NL 


UNCLASSIFIED 


165 


been  quite  clear  that  some  errors  are  discovered  much  later  than 
others,  but  never  conclusively  demonstrated  that  the  order  of  discovery  i 
times  was  not  the  result  of  a  random  process.  Replicating  over  debug¬ 
ging  session  was  sufficient  to  show,  by  inspection,  that  certain  errors 
were  clearly  more  obscure.  A  chi-square  analysis  bore  out  this  in¬ 
equality  of  expected  discovery  times,  as  did  the  vastly  differing 
maximum  likelihood  estimator  values  for  error  "difficulty  indexes.” 

A  more  specific  finding  was  evident  from  the  analysis  per¬ 
formed  upon  the  ranks  assigned,  by  subject,  in  order  of  error  discovery. 
Not  only  were  certain  errors  discovered  more  generally  early  or  later, 
but  the  order  of  all  error  discoveries  was  similar  for  all  subjects. 

This  conclusion  was  based  upon  the  rejection  of  a  random  configuration 
hypothesis  by  use  of  Kendall  Coefficient  of  Concordance,  and  makes  it 
clear  that  the  equal  mean -time-to-f allure  assumption  underlying  the 
Jelinski-Moranda  model  is  not  justified. 

A  surprising  result  also  reported  in  the  rank  analysis  was  the 
high  correlations  between  the  two  sets  of  discovery  order  rank  means 
associated  .with  groups  using  black-box  and  white-box  data.  Com¬ 
bined  with  the  fact  that  the  analysis  of  variance  on  error  discovery 
counts  showed  the  groups  to  be  similar,  one  can  only  conclude  either 
that  theoretical  differences  between  these  approaches  are  not  borne 
out  in  reality,  or  that  the  programs  were  not  large  enough  systems  that 
the  test  sets  represented  radically  different  input  subdomain.  We  sus¬ 
pect  that  the  answer*  lies  somewhere  between. 

The  greatest  benefit  of  obtaining  replications  of  error  dis¬ 
covery  time  sets  for  the  same  programs,  was  the  ability  to  test 

i _ 


if 


186 


software  reliability  models  defined  in  terms  of  known,  arbitrarily 
indexed  errors.  In  the  absence  of  replicated  discovery  time  ensembles, 
it  is  natural  to  define  failure  models  in  terms  of  successive  inter¬ 
arrival  times.  For  most  published  failure  models,  however,  thi6  pro¬ 
cedure  makes  the  hidden  assumption  that  any  error  is  equally  likely  to 
occur  (which  coincidently  dispells  any  concern  with  order  statistics). 

I 

To  employ  the  replications  of  discovery  times,  it  was  necessary! 

I 

to  find  a  means  of  aggregating  the  performances  of  quite  dissimilar  ! 
subjects.  The  simplest  means  of  doing  so  was  the  hypothesis  that  each 
subject's  error  discovery  rate  was  proportional,  to  any  other  subject, 
for  all  errors.  Hence,  the  discovery  rate  of  a  particular  subject  for 
a  particular  error  was  assumed  to  be  the  product  of  an  individual's 
proficiency  index  pi  and  the  error's  inherent  arrival  rate  0^.  The 
maximum  likelihood  estimates  for  the  set  of  proficiency  indices  p  and 
error  difficulty  indices  a  were  obtained  and  used  to  transform  all 
discovery  times  for  each  program  to  normalized  values  adjusted  for 
individual  and  error  differences.  Since  many  errors  went  undiscovered, 
a  distributional  test  (Gnedenko's)  for  exponent iality  was  used,  which 

I 

employed  censored  samples.  Even  after  adjusting  for  vastly  differing 
error  difficulty  indices,  the  normalized  discovery  time  sets  exhibited 
a  decreasing  rate  of  discovery.  j 

This  result  contradicts  the  assumptions  of  the  Jelinski-Moranda 
and  Schick-Wolverton  models,  the  former  assuming  a  constant  discovery 

l 

Irate  and  the  latter  hypothesizing  an  increasing  discovery  rate  for 
'unnormalized  discovery  times.  If  anything,  the  short  debugging  dura¬ 
tions  allotted  and  learning  effects  of  the  subjects  would  tend  to 


187 


increase  the  observed  discovery  rates  over  time,  and  yet  a  statisti¬ 
cally  significant  dropoff  in  these  rates  was  indicated  by  Gnedenko's 
test-  The  only  conclusion  that  can  be  drawn  from  this  finding  is  that 
the  longer  an  error  goes  undiscovered,  the  less  is  the  likelihood  of 
its  discovery  in  the  near  future.  This  unfortunate  effect  is  more 
pronounced  when  the  possibility  of  varying  error  difficulties  is  intro¬ 
duced,  as  the  more  obscure  errors  will  tend  to  be  found  latest  in  the  j 
debugging  task  and  at  a  slower  rate  than  earlier,  more  lucid  errors. 

There  is  evidence  in  the  literature  to  support  this  conjecture 
of  decreasing  error  discovery  rates.  Schick  and  Wolverton  (1978) 
report  on  a  set  of  error  data  for  which  the  best  distributional  fit 
was  obtained  by  employing  a  Weibull  distribution  with  a  decreasing 
discovery  rate  parameter.  Littlewood  (1978)  analyzed  a  large  Bet  of 
software  failure  times  and  found  that  the  exponential  model  was  not 
sufficiently  skewed  to  the  right,  indicating  increasingly  greater 
overestimation  of  failure  probability  as  time  goes  on.  Littlewood ’s 
solution  was  to  keep  the  assumption  of  exponential  arrival  times  and 
Impose  the  assumption  that  successive  error's  failure  rates  were 
random  variables  distributed  by  related  gamma  distributions. 

We  proposed  a  software  reliability  model  somewhat  related  to 
that  of  Littlewood  ’  s .  Error  difficulty  indices  were  assumed  to  be 
gamma  distributed  and  serving  as  the  parameter  for  exponentially  dis-  1 

i 

J tributed  discovery  times  (not  interarrival  times  like  that  assumed  by 
■Littlewood).  From  our  order  statistic  based  model  we  derived  maximum 
j likelihood  estimates  of  the  number  of  resident  errors,  and  these  were 
much  larger  than  the  number  of  known  errors.  We  believe  that  it  is 


,  188  ■ 

{  i 

j  the  assumption  of  exponent iality  that  underlies  the  failure  of  our  ] 

f 

!  model  in  this  regard,  as  well  as  that  of  Jelinski-Moranda  and  Schick- 

I 

i 

Wolverton.  We  further  believe  that  the  adequate  distributional  fits 
reported  by  Littlewood  (1973)  are  the  result  of  the  exceedingly  flex¬ 
ible  model  which  he  employs  and  we  expect  to  find  after  further  re¬ 
search  that  his  method  is  as  inaccurate  in  predicting  the  number  of 
resident  errors  as  it  is  unrepresentative  of  the  phenomenon  that  he 
is  attempting  to  model. 

We  now  believe  more  strongly  than  before  in  the  strength  of 
generalized  order  statistic-based  software  reliability  model.  The 
conformance  of  error  discovery  rank  orders  across  subjects  leads  us 
to  conclude  that  errors  are  Indeed  individuals  and  that  probability 
models  must  be  formulated  for  actual  errors  and  not  associated  with 
the  order  in  which  errors  are  observed  (as  implicitly  assumed  with 
models  which  employ  interarrival  times).  The  rejection  of  the  assump¬ 
tion  of  discovery  time  exponentiality  causes  the  theoretical  collapse 
of  the  Littlewood  and  Jelinski-Moranda  models,  as  exponentiality  of 
discovery  times  is  a  necessary  condition  for  exponentiality  of  inter- 
arrival  times.  Our  model  need  only  be  adjusted  by  finding  the  proper 
distributions  for  errors'  shape  parameters  (which  clearly  differ  from 
error  to  error),  and  the  discovery  or  arrival  time  distribution  best 
suited  to  employ  these  (random)  parameters.  It  is  quite  possible  that 
the  flexibility  in  discovery  rate  afforded  by  the  WeibuU.  distribution 
may  serve  this  purpose.  Our  own  further  research  will  proceed  along 
these  lines,  and  these  arguments  are  offered  to  the  rest  of  the  soft¬ 
ware  reliability  ccnmunlty  as  suggestions  for  further  study. 


In  summary,  the  findings  of  this  study  have  shown  that  the 


quality  of  program  testing  and  debugging  is  substantially  affected  by 
the  proficiency  of  the  individual  engaged  in  this  activity  and  charac¬ 
teristics  of  the  software  under  test. 

The  most  prolific  error  discoverers  were  generally  well- 
educated  with  technical  backgrounds.  The  range  of  proficiency  was 
broad,  with  the  best  subject  finding  errors  at  20-30  times  the  rate 
of  the  worst.  Neither  the  subject’s  usual  programming  language  nor 
his  normal  computing  mode  (batch  or  interactive)  had  any  effect  on 
performance.  Moreover,  the  subjects'  own  assessment  of  their  skills 
and  proficiencies  had  no  bearing  on  the  types  or  number  of  errors  that 
were  found.  Subjects  with  recent  experience  in  sophisticated  applica¬ 
tions  generally  performed  best  on  all  the  experimental  programs  and 
there  is  every  reason  to  believe  that  this  finding  carries  over  into 
professional  environments. 

The  strongly  adverse  effect  that  logical  complexity  was  found 
to  exert  on  program  debugging  effectiveness  suggests  that  the  applica¬ 
tion  of  structured  programming  and  use  of  block-structured  languages 
can  help  improve  software  testing  activities,  in  addition  to  their 
other  acknowledged  benefits  (Meyers,  1976). 

The  type  of  test  data  employed  in  debugging  had  little  effect 
on  subject  performance  in  this  study,  although  the  particular  errors 
found  differed  from  one  test  data  type  to  the  other.  Hence,  we  sug¬ 
gest  that  both  "black-box"  and  "white-box"  methods  be  employed  in 
practice  to  maximize  testing  coverage.  We  are  joined  in  this  opinion 
by  Meyers  (1979). 


One  of  the  most  interesting  findings  of  this  study  was  the 
decided  nonhcmogene ity  of  the  error  population  for  each  program.  A 
hierarchy  among  errors  based  upon  their  apparent  difficulty,  was  con¬ 
firmed  by  subject  after  subject,  as  evidenced  by  the  concordance  among 
the  subjects'  orderings  of  error  discoveries.  On  the  basis  of  this 
finding,  we  conclude  that  the  remainder  of  any  debugging  effort  is 
time-consuming  not  only  because  there  are  fewer  errors  to  find,  but 
also  because  those  errors  remaining  are  inherently  more  subtle.  This 
supposition  contradicts  the  assumptions  of  the  Jelinski-Moranda  soft¬ 
ware  reliability  model  which  assumes  that  the  software-failure  rate 
during  test  is  inversely  proportional  to  the  number  of  remaining 
errors.  In  fact,  after  adjusting  error  discovery  times  for  differences 
In  error  difficulties  we  found  that  the  discovery  rate  still  decreased 
over  time.  This  conclusion  is  in  accord  with  the  debugging  experiences 
on  the  practicing  programming  community  and  better  explains  the  pro¬ 
pensity  for  software  projects  to  conclude  the  testing  activity  over 
budget. 

Ve  conclude  that  all  but  the  most  trivial  programs  are  composed 
of  errors  whose  mean-times-to-discovery  differ  substantially.  This 
being  the  case,  the  time  necessary  to  find  the  first  error,  the  last 
error,  or  any  error  between  is  an  order  statistic  whose  distribution 
is  based  upon  the  underlying  distributions  of  the  individual  errors. 


We  believe  that  our  own  attempt  to  fit  an  order  statistic-based  model 
failed  due  to  the  decreasing  discovery  rate  exhibited  by  the  experi¬ 
mental  data,  even  after  adjusting  for  varying  difficulties.  It  is 


BIBLIOGRAPHY 


192 


BIBLIOGRAHIY 


Alberts,  D.  S.  The  economics  of  software  quality  assurance.  Proceed¬ 
ings  of  the  1976  National  Computer  Conference.  New  York:  AFIPS 


I 

i 


Ashcroft,  E.,  &>  Manna,  Z.  The  translation  of  GOTO  programs  to  WHILE  ; 
programs.  Computer  Science  Report  Number  CS-1B8  (Stanford  Univer-  1 
eity),  1970.  I 


Baker,  F.  T.  Structured  programming  in  a  production  environment. 

IEEE  transactions  on  software  engineering.  New  York:  IEEE  Press, 
June  1975 •  Pp.  241-252. 


Bard,  Y.  Nonlinear  parametric  estimation.  New  York:  Academic  Press, 

197^- 


Basin,  S.  L. 
sampling. 


Estimation  of  software  error  rates  via  capture-recapture 
Palo  Alto,  Calif.:  Science  Applications,  September  1973- 


! 

I 

I 


Bell,  T.  E.,  &  Thayer,  T.  A.  Software  requirements:  Are  they  really  a 
problem.  Proceedings  of  the  Second  International  Conference  on 
Software  Engineering.  New  York:  IEEE  Press,  197^>-  Pp.  61-68. 


ffierge,  C.  Graphs  and  hypergraphs.  Amsterdam:  North-Holland,  1973* 


Boehm,  B.  W.  Software  and  its  impact:  A  quantitative  assessment. 
Datamation,  May  1973  1£( 5),  W-59- 

Boehm,  B.  W.,  McClean,  R.  K.,  &  Urfrig,  D.  B.  Some  experience  with 
automated  aids  to  the  design  of  large-scale  reliable  software. 
IEEE  transactions  on  software  engineering.  New  York:  IEEE  Press, 
March  1975-  Pp.  125-133- 


Boyer,  R.  S.,  Elspas,  B.,  &  Levitt,  K.  N.  SELECT:  A  formal  system  for 
testing  and  debugging  programs  by  symbolic  executions.  Proceedings 
of  the  1975  International  Conference  on  Reliable  Software.  New 
York:  IEEE  Press,  1975.  Pp.  254-245- 


Brooks,  F.  P.  The 


,hical  man-month.  Reading,  Mess 


Addison -Wesley 


] 


194 

Brovn,  A.  R.,  &  Sampson,  W.  A.  Program  debugging.  London:  MacDonald, 
1975- 

Brown,  J.,  &  Lipov,  M.  Testing  for  software  reliability.  Proceedings 
of  the  1975  International  Conference  on  Reliable  Software. 

New  York:  IEEE  Press,  1975-  Pp.  51&-527* 

Brown,  J.  R.  Why  tools?  Proceedings  of  the  Eighth  Annual  Symposium  on  | 
Computer  Science  and  Statistics,  UCLA,  Los  Angeles,  February  12-13,  i 
1975,  PP-  54-42. 

Caine,  S.  H-,  A  Kent,  E.  PDL — A  tool  for  software  design.  Proceedings  I 
of  the  1975  National  Computer  Conference.  New  York:  AFIPS,  1975. 

Pp.  314-319. 

Christofides,  N.  Graph  theory — An  algorithmic  approach.  New  York: 
Academic  Press,  1975* 

Clarke,  L.  A  system  to  generate  test  data  and  symbolically  execute 
programs . (cS-060-75)  •  Boulder:  University  of  Colorado,  Department 
of  Computer  Science,  February  1975*  ! 

Clarke,  L.  Generating  test  data  and  symbolically  executing  programs 

written  in  ANSI  FORTRAN.  1KKE  transactions  on  software  engineering. 
New  York:  IEEE  Press,  September  1976.  Pp.  19^-207 . 

Clarke,  L.  A.  Testing:  Achievements  and  frustrations.  Proceedings  of 
the  Second  International  Computer  and  Applications  Conference. 

New  York:  iiiEE  Press,  November  197$.  Pp. 

The  computer  industry.  Waltham,  Mass.:  International  Data  Corporation, 
197^: 

Cornell,  L.,  &  Halstead,  M.  H.  Predicting  the  number  of  bugs  expected 
in  a  program  module  (CSD-TR/205) .  Lafayette,  Ind. :  Purdue  Univer¬ 
sity,  October  1976* 

Dahl,  O^J. ,  Dijkstra,  E.  W.,  &  Hoare,  C.  A.  R.  Structured  programming.  ; 
London:  Academic  Press,  1972.  j 

David,  H.  A.  Order  statistics.  New  York:  Wiley,  1970* 

I 

Davis,  C.  G.,  &  Vick,  C.  R.  The  software  development  system.  IEEE  1 
transactions  on  software  engineering.  New  York:  IEEE  Press,  ! 

January  1977*  Pp.  £>9-84*  , 


Dorn,  P.  H.  1979  budget  survey.  Datamation,  January  1979,  2£(l),  67- 

!  89. 


i 


Slspas,  B.  A  comparison  of  formal  verification,  symbolic  execution, 
and  formal  testing.  Unpublished  note,  SRI,  June  1977. 

Elspas,  B.,  Levitt,  K.  N.,  A  Waldinger,  R.  J.  An  assessment  of  the 
techniques  for  proving  program  correctness.  Computing  Surveys. 

June  1972,  4(2),  13-34. 

rosdick,  L.,  tt  Osterveil,  L.  J.  Data  flow  analysis  in  software  relia¬ 
bility.  Computing  Surveys,  September  1976,  8(3),  305-330. 

Eujii,  M.  Independent  verification  of  highly  reliable  programs. 

Proceedings  of  the  First  International  Computer  and  Applications 
Conference.  New  York:  I KEF  Press,  1978-  Pp.  38-44. 

unami,  Y.,  fc  Halstead,  M.  H.  A  software  physics  analysis  of  Akiyama's 
debugging  dats  (CSD-TR/44) .  Lafayette,  Ind.:  Purdue  University, 

May  1975- 

abow,  H.  H.,  Maheshwari,  S.  N.,  A  Osterveil,  L.  J.  On  two  problems  in 
the  generation  of  program  test  paths.  IEEE  transactions  on  soft- 


rfect  debugging  model  for  reliability  and  other 


quantitative  measures  of  software  systems  (RADC -TR-76-155) . 

New  York:  Borne  Air  Development  Center,  1978* 

.oodenough,  J.  B.,  &  Gerhart,  S.  L.  Toward  a  theory  of  test  data 
selection.  IhKK  transactions  on  software  engineering.  New  York: 
IEEE  Press,  June  1975*  Pp.  156-173. 

Sreen,  T.  F.,  Schneidewind,  N.  F.,  Howard,  G.  T.,  &  Pariseau,  R. 

Program  structures,  complexity,  and  error  characteristics.  Pro- 
ceedings  of  the  Symposium  on  Computer  Software  Engineering. 


New  York:  Polytechnic  Press,  1976.  Pp.  139-154. 

aberman,  A.  N.  Path  expressions  (Tech,  report).  Pittsburgh,  Pa. 
Camegie-Mellon  University,  1975* 

alstead,  M.  H.  Elements  of  software  science.  New  York:  Elsevier 


arary,  F.  Graph  theory.  Reading,  Mass.:  Addison-Wesley,  1972. 

etzel,  W.  C.  Program  test  methods.  Englevood  Cliffs,  N. J. :  Prentice 
Hall,  1973. 

etzel,  W.  C.  Comparing  and  analyzing  techniques  for  detecting  errors 
in  computer  programs.  Unpublished  doctoral  dissertation,  Puke 
University,  1975  • 

offman,  R.  H.  Automated  verification  system  user’s  guide.  TRW  note, 
72-FMT-891,  1972. 

offman,  R.  H.  User  information  for  interactive  automated  test  data 
generator  (ATDG)  system.  NASA  Johnson  Space  Center,  internal  riot.p 


poslum  on  Computer 

1973.  Pp.  78-&L. 


ware  Reliability.  New  York:  IEEE  Press, 


ennedy,  J.  E.  A  survey  of  automated  computer  program  verification 
tools.  Aerospace  Corporation,  August  15,  197^- 

ing,  J.  A  new  approach  to  program  testing.  Proceedings  of  the  1975 
International  Conference  on  Reliable  Software.  New  York:  IEEE 
Press,  1975.  Pp.  226-2J3. 

ing,  J.  Symbolic  execution  and  program  testing.  Communications  of  the 
ACM,  July  1976,  18(7),  2>30. 

nuth,  D.  E.  An  empirical  study  of  FORTRAN  programs  (TR  CS-186) . 

Palo  Alto:  Stanford  University,  1971- 


197 


Krause,  K.  W.,  Smith,  R.  W.,  &  Goodwin,  M.  A.  Optimal  software  test 
planning  through  automated  network  analysis.  Proceedings  of  the 
ifelfclE  Symposium  on  Computer  Software  Reliability.  New  York:  IEEE 

Press,  1973-  Pp.  18-22. 

Littlewood,  B.  Theories  of  software  reliability:  How  good  are  they 
and  can  they  be  improved?  Unpublished  manuscript,  February  1979. 

Littlewood,  B.,  &  Verrall,  J.  A  Bayesian  reliability  growth  model  for 
computer  software .  Applied  Statistics:  Journal  of  the  Royal  Sta¬ 
tistical  Society,  Series  C,  1973,  £2(3),  212-231- 

Love,  L.  T.,  &  Bowman,  A.  B.  An  independent  test  of  the  theory  of 
software  physics.  ACM  SIGPLAN  Notices,  November  1976,  pp.  4-9. 

McCabe,  J.  T.  A  complexity  measure.  TKER  transactions  on  software 
engineering.  New  York:  IEEE  Press,  December  1976.  Pp.  308-320. 

jMeyers,  G.  The  art  of  software  testing.  New  York:  Wiley,  1979. 

Miller,  E.  F.,  &  Melton,  R.  A.  Automated  generation  of  testcase  data¬ 
sets.  Proceedings  of  the  1975  International  Conference  on  Reliable 
Software"!  New  York:  IEEE  Press,  1975*  Pp.  51-58. 

Miller,  E.  F.,  Paige,  M.  R.,  &  Benson,  J.  P.  Structural  techniques  of 
program  validation.  Digest  of  Papers:  13th  IEEE  Computer  Society 
International  Conference.  New  York:  jKkk  Press,  1976.  Pp.  31-40. 

Mills,  H.  D.  On  the  statistical  validation  of  computer  programs. 

IBM  FSD  unpublished  paper,  July  1970- 

Mills,  H.  D.  Mathematical  foundations  .for  structured  programming . 

IBM  Corporation,  Federal  Systems  Division,  FSC  71-5108,  1971. 

Monn,  R.,  Schaeffer,  N.,  &  Singpurwalls,  M.  Reliability  theory  and 
statistics.  New  York:  Holt,  Rinehart  &  Winston,  1977- 

Moranda,  P.  B.  Prediction  of  software  reliability  during  debugging. 
Proceedings  of  the  1975  Annual  Reliability  and  Maintainability 
SymposiumT  1975.  Los  Angeles,  September  18-20,  pp.  27-50. 

Moranda,  P.  B.  Limits  to  program  testing  with  random  numbers  inputs. 
Proceedings  of  the  Secdhd  International  Computer  and  Applications 
Conference .  New  York:  TEEF.  Press,  1978-  Pp.  521-526. 

Musa,  J.  D.  A  theory  of  software  reliability  and  its  application. 

IKKK  transactions  on  software  engineering.  New  York:  i EEE  Press, 

!  September  1975*  Pp-  212 -2 JO. 


Nelson,  E.  C.  A  statistical  basis  for  software  reliability  assessment. 
TEW  Software  Series.  TKW-SS-73-05,  March  1973- 

Osterweil,  L.  J.,  &  Fosdick,  L.  D.  DAVE — A  validation  error  detection 
and  documentation  system  for  FORTRAN  programs.  Software  Practice 
and  Experience,  1976,  6,  35-52. 

Paige,  M.  R.  Program  graphs,  an  algebra,  and  their  implication  for 

programming.  IEEE  transactions  on  software  engineering.  New  York: 
IEEE  Press,  September  1975 •  Pp.  286-291. 

Phister,  M.  Data  processing  technology  and  economics.  Santa  Monica. 


1973-  Pp-  28-37 


.  New  York:  IEEE  Press,  April -May 


Ramamoorthy,  C.  V.,  &  Ho,  S.  F.  Testing  large  software  with  automated 
software  evaluation  tools.  TFreE  transactions  on  software  engineer 
log.  New  York:  IEEE  Press,  March  1975-  Pp. 

amamoorthy,  C.  V.,  Ho,  S.  F.,  &  Chen,  W.  T.  On  the  automated  genera¬ 
tion  of  program  test  data.  IEEE  transactions  on  software  engineer 
ing.  New  York:  IEEE  Press,  December  197^.  Pp.  293-300. 

amamoorthy,  C.  V.,  Kim,  K.  H.,  &  Chen,  W.  T.  Optimal  placement  of 
software  monitors  aiding  systematic  testing.  ikse  transactions  on 
software  engineering.  New  York:  IEEE  Press,  December  1975? 

Pp.  403-4l07 

eifer,  D.  J.,  &  Trattner,  S.  A  glossary  of  software  tools  and  tech¬ 
niques.  Computer,  July  1977?  18(2),  121-131. 

oss,  D.  T.  Guest  editorial — reflections  on  requirements.  ikkk  trans 
actions  on  software  engineering.  New  York:  IEEE  Press,  January 

1977.  pp.  2-5.  C&5 

oss,  D.  T.  Structured  analysis:  A  language  for  communicating  ideas. 
IKkE  transactions  on  software  engineering.  New  York:  IEEE  Press, 
January  1977*  Pp.  16-33-  (b) 

oss,  D.  T.,  &  Schoman,  K.  E.  Structured  analysis  for  requirements 
definition.  IEEE  transactions  on  software  engineering.  New  York: 
IEEE  Press,  January  1977-  Pp.  26-30. 

ubey,  R.,  Dana,  J.,  &  Riche,  P.  Quantitative  aspects  of  software 
validation.  IEEE  transactions  on  software  engineering.  New  York: 
IEEE  Press,  June  1975*  Pp.  26-3O;  150-155- 


199  i 

Schick,  G.  S.,  &  Wolverton,  R.  W.  Assessment  of  software  reliability.  | 
TRW  Software  Series,  TRW-SS-73-O^,  September  1972. 

Schick,  G.  S.,  it  Wolverton,  R.  W.  An  analysis  of  competing  software  j 
reliability  models.  EEEE  transactions  on  software  engineering.  I 

New  York:  IEEE  Press,  March  1978*  Pp.  10^-120.  j 

Shooman,  M.  L.,  Schick,  G.  S.,  &  Wolverton,  R.  W.  Types,  distribution,  j 
and  test  and  correction  times  for  programming  errors.  IfIKh  trans-  | 
actions  on  software  engineering.  New  York:  IEEE  Press,  March  1975-  | 
Pp.  11-18.  i 

t 

I 

Stay,  J.  F.  HIPO  and  interactive  program  design.  IBM  Systems  Journal,  1 
September  1976,  pp.  1-17- 

Stevens,  W.  P.,  Myers,  G.  J.,  &  Constantine,  L.  L.  Structured  design. 
IBM  Systems  Journal,  March  1971*,  PP-  12-27- 

Stuck!,  L.  G.  New  directions  in  automated  tools  for  improving  software 
quality.  In  Current  trends  in  programming  methodology.  Englewood 
Cliffs,  N.J.:  Prentice-Hall,  1977- 

Sullivan,  J.  E.  Measuring  the  complexity  of  computer  software. 

Bedford,  Mass.:  Mitre  Corporation,  June  1973. 

Teichroew,  D.,  &  Hershey,  E.  A.  III.  PSL/PSA:  A  computer-aided  tech¬ 
nique  for  structured  documentation  and  analysis  of  information 
processing  systems.  transactions  on  software  engineering. 

New  York:  IEEE  Press,  January  1977*  Pp.  1&-29 • 

Thayer,  T.  A.,  Lipow,  M.,  &  Nelson,  E.  C.  Software  reliability  study 
(TRW  Software  Series,  TKW-SS-76-O3) .  Redondo  Beach,  Calif.:  TRW, 
March  1976- 

Walker,  M.  G.  A  theory  for  reliable  software.  Datamation,  September 
1978,  pp.  12-13. 

Walston,  P.,  &  Felix,  J.  Error  distributions  in  program  codes.  Pro-  j 
ceedings  of  the  Second  International  Computer  and  Applications 
Conference.  New  York:  IEEE  Press,  197&-  Fp.  123-126.  I 

Wegner,  P.  Research  directions  in  software  technology.  Proceedings 
of  the  Third  International  Conference  on  Software  Engineering. 

New  York:  IEEE  Press,  1978.  Pp.  2*0-259-  ; 


[Zelkowitz,  M.  V.  Perspectives  on  software  engineering.  ACM  Computing 
Surveys,  June  1978,  10(2),  1-13- 


APPENDICES 


200 


APPENDIX  A 

QUESTIONNAIRE  INSTRUCTIONS 


201 


QUESTIONNAIRE  INSTRUCTIONS 


1.  NAME 

2.  AGE 

3.  SEX 

4.  GENERAL  EDUCATIONAL  BACKGROUND 

List,  from  earliest  educational  experience  to  most  recent,  the 
periods  in  your  life  in  which  you  were  enrolled  in  same  form  of 
schooling.  Examples  are  given  Vlow  for  each  column  of  the 
table  you  are  requested  to  fill  in. 


Educational  Examples  include  Hig£  School.  Undergraduate, 
Environment  Training  School  (CDI,  etc.).  Graduate  School 


Period  of  Examples  include  Sept.  1975 ■'June  1978,  academic 

Enrollment  years  of  1973  to  1975 


Major  or 
Emphasis 


Examples  include  Music,  Data  Processing, 
Computer  Science,  None  (if  applicable) 


Title  of  Examples  include  B.S.,  MBA,  PhD,  None 

Degree  (if  applicable) 


5.  EDUCATIONAL  BACKGROUND  IN  COMPUTER-RELATED  STUDIES 

List  the  number  of  units  earned  in  courses  directly  related  to 
studies  of  computer  systems,  programming,  theory,  or  applications. 

State  the  number  of  units  in  terms  of  quarter  units  or  semester 
units. 

Estimate,  for  each  period  of  enrollment,  the  percentage  of  your 
course  work  which  required  significant  amounts  of  programming 
(say,  one  writing  one  large  program  or  5  to  10  smaller  ones). 


6.  EDUCATIONAL  BACKGROUND  IN  PROGRAMMING  THEORY  AND  PRACTICE 

List  the  number  of  units  earned  in  courses  dealing  directly 
with  topics  in  programming  theory  and  practice. 

Examples  of  such  courses  include  language  courses  (COBOL,  FORTRAN, 
BASIC),  Algorithms,  Data  Structures,  Programming  Principles. 

Courses  such  as  Systems  Analysis,  Numerical  Analysis,  Canputing 
Theory  or  Management  Information  Systems  do  not  qualify. 


Include  any  courses  that  you  have  taught  but  not  formally  taken.  J 

1 

7.  PROFESSIONAL  EXPERIENCE  INFORMATION  SCIENCES  AND  DATA  PROCESSING  j 

List  all  distinct  experiences  that  you  have  had  in  the  data  pro-  > 
cessing  and  computer  science  field,  not  required  by  courses  that  I 


r 


203 

you  have  taken.  These  experiences  may  include  Jobs,  consulting 
engagements,  contract  work,  independent  development  of  skills 
(such  as  camputing-as-a-hobby),  or  the  donation  of  your  skills. 

For  each  separate  experience,  state  a  short  description  or  job 
title,  the  duration  of  the  experience  in  man-months,  the  time 
period  over  ■which  this  experience  spanned,  and  the  percentage 
of  these  man-months  devoted  to  program  writing  and  debugging. 

For  a  part-time  job  of,  say,  10  hours  a  week  in  which  £  of  the 
work  was  system  design  and  5  programming  over  a  period  from 
June  1,  1977-August  1,  1978,  an  entry  in  the  table  would  appear  as 

Job  Job  Time  %  of  Time 

Title  Duration  Period  Programming 


■c  :  i  n  :  i — .  1 

V  !  1.  • 


-  :  -j 

• 

• 

• 

-  : 

Programme  r/Analy s  t 
(part-time) 

lh/h  =  3.5 

man-months 

to  Aug.  1,  1978 

50* 

8.  PROGRAMMING  LANGUAGE  EXPERIENCE 

Of  all  the  programming  required  of  you  in  educational  environ¬ 
ments  (see  question  5 ),  estimate  the  percentage  of  the  time  you 
used  each  of  the  programming  languages  listed  in  the  table  pro¬ 
vided  for  this  question.  Reply  by  filling  in  the  table  under 
the  column  titled  EDUCATIONAL. 

Of  the  programming  required  of  you  during  your  professional  ex¬ 
periences  estimate  the  percentage  of  the  time  you  used  each  of 
the  programming  languages  listed  in  the  table  provided  for  this 
question.  Reply  by  filling  in  the  table  under  the  column  titled 
PROFESSIONAL. 

9.  BATCH/lNTERACTIVE  EXPERIENCE 

We  define  a  BATCH  environment  as  a  program  development  and  test 
environment  in  which 

•  programs  are  coded  and  keyed  onto  cards 

•  the  programmer  compiles  and/or  runs  the  program  to 
obtain  a  hard  copy  listing  and/or  program  output 

•  the  program  is  reviewed  and  debugged  by  inspecting 
the  listing  for  sane  length  of  time 

•  modifications  are  made  to  the  program  card  deck  to 
reflect  desired  program  changes 

•  the  program  deck  is  resubmitted  to  begin  the  cycle  again 

We  define  an  INTERACTIVE  environment  as  a  program  development  and  test 
environment  in  which 


204 


•  programs  are  entered  into  a  terminal 

•  programs  are  run  at  the  terminal  to  obtain  program  results 

•  results  are  studied  at  the  terminal 

•  the  program  is  listed  at  the  terminal  and  edited  to  reflect 
desired  program  changes 

•  the  cycle  is  re-begun  by  running  the  program 

What  percentage  of  your  programming  experience  was  acquired  in  situa¬ 
tions  more  closely  resembling  a  batch  environment,  and  what  percentage 
was  acquired  in  situations  more  closely  resembling  an  interactive  en¬ 
vironment? 

10.  SELF  ESTIMATION 

How  would  you  estimate  your  abilities,  proficiency,  and  knowledge 
in  each  of  the  following  areas? 

«  data  processing  principles 

•  computer  sciences  principles 

•  systems  analysis 

•  program  design 

.  program  writing 

•  program  debugging 

.  operations  research 

•  statistics  and  probability 

•  mathematics 

.  file  handling 

•  algorithms 


i 


205 


QUESTIONNAIRE 


1.  NAME:  _ 

2.  AGE:  _ 

3«  SEX:  Male _  Female _ 

4.  GENERAL  EDUCATIONAL  BACKGROUND 


5.  EDUCATIONAL  BACKGROUND  IN  COMPUTER-RELATED  STUDIES 


Educational  Period  of  Semester  or  Quarter  Requiring 

Environment  Enrollment _ Units _ Units  Programming 


6.  EDUCATIONAL  BACKGROUND  IN  PROGRAMING  THEORY  AND  PRACTICE 


LANGUAGE  EXPERIENCE 


DATA  PROCESSING  PRINCIPLES 

COMPUTER  SCIENCE  PRINCIPLES 

SYSTEMS  ANALYSIS 

SYSTEMS  DESIGN 

PROGRAM  DESIGN 

PROGRAM  WRITING 

PROGRAM  DEBUGGING 

OPERATIONS  RESEARCH 

PROBABILITY  AND  STATISTICS 

MATHEMATICS 

FILE  HANDLING 


ALGORITHMS 


Kg] 


BUE3TI0IC1AIRE  II 

fletK  devote  scat  tine  to  retd  the  program  specifications  Included  In  the  packet. 

1  am  eure  that  you  vlll  find  the  experiment  such  no re  enjoyable  If  you  bare  prepared 
by  studying  theee  specification*  to  determine  what  It  le  the  prog ran*  are  supposed 
to  do. 

After  reading  and  stutylng  the  program  specifications,  please  Indicate  in  the  boxes 
below,  the  extant  of  your  familiarity  with  the  "theory*  behind  these  applications, 
as  well  as  the  amount  of  prior  experience  you  have  had  In  programing  problems  of 
these  kinds. 

Mow  that  you  have  read  and  studied  the  program  specifications  for  each  of  the  pro¬ 
grams  { SC  HR,  am,  LHPF,  and  ITAX)... 

(1)  Bow  familiar  are  you  with  the  "theory*  used  to  solve  these  problems!  Tour 
familiarity  with  the  principles  underlying  these  programs  may  have  came 
from  your  schooling,  reading,  or  research  on  the  Job.  Please  answer  below 
under  THEOHt. 

(2)  What  Is  the  extent  of  your  experience  in  designing,  programing,  and/or 
debugging  programs  which  dealt  with  prohleau  like  those  addressed  fay  the 
experimental  programs!  'Tour  experience  may  have  been  acquired  in  school, 
an  the  Job,  or  through  recreational  computing.  Please  answer  below  under 
PRACTICE. 


T  H  E  0  B  T 


PRACTICE 


r 


i 

X 


APPENDIX  B 

PROGRAM  METRICS 


210 


COMPUTATIONAL  CONTENT 


The  degree  of  computational  content  was  measured  by  an  "opera-  j 
tor  ratio"  motivated  by  Halstead's  approach  (1977)  to  decomposing  pro¬ 
grams  into  indivisible  tokens.  Each  basic  construct  in  a  programming 
language  is  categorized  as  one  of  two  token  types:  operator  or  oper- 

I 

ands.  An  operand  is  any  object  of  an  action  and  includes  data  varibles 

i 

and  constants.  Operators  are  program  symbols  which  have  zero  or  more  j 
operands  as  functional  arguments,  and  effect  program  execution.  j 

Each  indivisible  construct  in  the  BASIC  language  was  assigned  | 
membership  into  the  operator  or  operand  class.  Some  constructs,  j 

though  always  appearing  together,  were  counted  as  two  or  more  operators 
if  one  of  the  constructs  could  appear  separately  as  an  operator. 

String  and  numeric  variables,  IMAGE  strings  and  unvarying  data  for 
tables  were  counted  as  operands.  Operators  were  dichotomized  into  two 
classes:  computational  and  none omputat ional .  Computational  operators 
are  those  program  tokens  which  take  one  or  more  data  as  operands  and 
produce  a  result.  Honcamputational  operators  are  all  others  which 
affect  the  order  of  program  execution  (control),  provide  input  or  out¬ 
put  (i/o) ,  or  provide  definition  (declarative).  A  list  of  the  specific 
operators  recognized  and  a  designation  of  category  is  given  in  Table 


B.l. 

A  correction  factor  suggested  by  Halstead  was  applied  to  opera- j 
tor  and  operand  counts.  Halstead  reasons  that  any  common  sub-expres¬ 
sion  used  repeatedly  represents  a  value  to  a  programmer  and  is  not  as  | 
computationally  complex  as  its  constituent  operators  may  imply.  With  ! 
this  in  mind,  the  operator  and  operator  counts  were  reduced  by  those  ; 

tokens  associated  with  common  expressions  in  close  textual  proximity  ! 

j 

to  one  another.  j 

The  "operator  ratio"  was  calculated  as  the  ratio  of  the  number 
of  computational  operators  to  the  total  number  of  operators.  It  was 
assumed  that  the  higher  this  ratio  becomes,  the  greater  is  the  degree 
of  "data  processing"  per  program  symbol.  Operator  ratios  were  calcu¬ 
lated  for  a  dozen  programs  from  the  Hewlett  Packard  Contributed  Library, 


211 


213 

chosen  to  represent  a  variety  of  assumed  degrees  of  computation.  The 
operator  ratios  computed  for  this  sample  ranged  from  55$  to  85$  with 
most  in  the  65$  to  75$  interval. 

On  this  basis  programs  with  ratios  of  65$  or  less  were  desig¬ 
nated  as  having  low  computational,  content  and  those  with  75$  or  more 
were  designated  as  having  high  computational  content. 

The  four  programs  eventually  employed  as  the  experimental  pro¬ 
grams  were  Judged  as  the  best  experimental  instruments  on  the  basis  of 
their  operator  ratios,  as  well  as  other  factors.  The  operator  ratios 
for  these  programs  are  displayed  in  Table  B.2. 


Table  B.2 

Operator  Ratios  for  the  Experimental  Programs 


Program 

Number  of 
Computational 
Operators 

Number  of 
Nonccmputational 
Operators 

Total 

Ratio 

Designation 

TTAX 

166 

83 

2k9 

67$ 

LOW 

LNPR 

322 

112 

434 

74$ 

HIGH 

> 

OPEM 

162 

50 

192 

84$ 

HIGH 

SCNR 

25^ 

158 

412 

62$ 

LOW 

ftj  can  be  seen  from  Table  B.2,  3TAX  and  OPM  had  higher  ratios  than 
the  presumed  ideals  for  membership  in  their  respective  classes,  while 
LNPR  and  SC NR  had  lower  than  ideal  ratios. 

This  phenomenon  occurred  repeatedly  during  the  selection  of  the 
experimental  programs  and  can  be  attributed  to  the  effect  of  program 
length  on  the  ratios.  During  the  comparison  of  Library  programs  and 
development  of  programs  as  experimental  candidates,  it  became  evident 


(that  higher  operator  ratios  became  more  difficult  to  find  as  programs 
jof  greater  and  greater  length  were  observed.  On  this  basis,  the  re- 
Isulting  experimental  programs  were  deemed  sufficient  as  representative 


or  their  respective  computational  content  classes,  even  though  differ¬ 
ing  slightly  within  like  classes. 

LOGICAL  COMPLEXITY 

Two  measures  were  employed  in  the  assessment  of  logical  com¬ 
plexity  for  experimental  program  candidates.  The  simpler  of  these  two 
was  McCabe 1 s  complexity  measure  (1976)  corresponding  to  the  cyclomatic 
number  of  a  program's  control  graph  or  flowchart.  In  practice,  this 
can  be  calculated  as  e  -  n  +  1,  where  n  equals  the  number  of  blocks  in 
a  flowchart  and  e  the  number  of  control  flow  arcs. 

A  measure  developed  by  TRW  (1973)  is  much  more  sophisticated 
and  takes  into  account  direction  of  program  branching  as  well  as  the 
degree  of  nesting  at  the  point  of  conditional  transfer.  TRW's  Logical 
omplexity  Metric  is  computed  as: 


L/S  +  Cj.  +  C2  4  B/1000 


where 

L  =  number  of  logical  statements  (GOTO,  GOSUB,  IF,  FOR-NEXT) 

S  =  number  of  executable  statements 
CA  =  measure  of  loop  complexity 
C2  =  measure  of  IF  statement  complexity 
B  =  number  of  separate  paths  from  any  segnent  to  any  other. 

Ij.  and  Cz  are  weighted  sums  of  the  number  of  conditional  branches  and 
loops  (respectively)  at  each  level  of  nesting. 

The  complexity  metric  values  calculated  for  each  experimental 
program  is  given  below  in  Table  B.J: 

Table  B.3 

I  Complexity  Metrics  for  the  Experimental  Programs 


McCabe ' s 
Metric 


Metric 


215 


Like  the  operator  ratio,  the  complexity  metrics  appeared  to  "be  sensi¬ 
tive  to  program  length — longer  programs  invariably  resulted  in  greater 
metric  values.  Since,  however,  logical  complexity  is  normally  believed 
to  increase  with  the  size  of  the  task,  this  property  of  the  metrics  did 
not  appear  inappropriate. 


A 


APPENDIX  C 

EXPERIMENTAL  PROGRAM  SPECIFICATIONS 
AND  LISTINGS 


I 


218 


Program  Processing 

Taxes  are  payable  on  an  amount  equal  to  the  total  of  sources  minus  the 
total  of  offsets.  Total  sources  is  equal  to  the  sum  of 


(state  and  federal) 

1. 

Alimony  received 

(federal  only) 

2. 

State  refund 

(state  and  federal) 

3- 

Salary  and  vages 

(state  and  federal) 

4. 

Any  excess  of  dividends  received 
over  $100. 

(state  and  federal) 

5. 

Any  excess  of  gambling  winnings 
over  losses. 

(state  and  federal) 

6. 

One -half  of  the  excess  of  long-term 
capital  gains  over  losses  and  carryover. 

(state  and  federal) 

7- 

One -half  of  the  excess  of  short-term 
capital  gains  over  losses  and  carryover. 

(state  and  federal) 

8. 

Interest  earned 

Total  offsets  is  equal  to  the 

sum  of 

(state  and  federal) 

1. 

The  lesser  of  $5000  and  the  excess  of 
long-term  losses  and  carryover  over 
long-term  gains. 

(state  and  federal) 

2. 

The  excess  of  a  +  b  +  c  over  3$  of 
earned  income 

a.  the  lesser  of  hospitalization  cost 
$2000 

b.  the  excess  of  drugs  and  medicine 

expense  over  of  earned  income 

c.  \  of  medical  insurance  cost 

(state  and  federal) 

3- 

Casualty  Loss 

(state  and  federal) 

4. 

The  lesser  of  charitable  deductions 
and  25$  of  earned  income 

(state  and  federal) 

5. 

Real  and  Persona]  Tax 

(federal  only) 

6. 

State  Income  tax  paid 

(federal  only) 

7- 

State  gas  tax 

(state  only) 

8. 

Federal  gas  tax 

(state  and  federal) 

9- 

Alimony  paid 

(state  and  federal) 

10. 

Employee  Business  expense 

The  rates  of  tax  on  taxable  income  are  given  below  as  well  as  the 
definitions  of  program  variables: 


219 


TsLxes  Payable/Refundable 


Federal 

Po 

State 

Pi 

Withholding 

Federal 

Wo 

State 

Wi 

Alimony  Received 

Ao 

State  Refund  Received 

R 

Earned  Income 

Salary  and  Wages 

To 

Dividends 

*1 

Interest  Earned 

y2 

Gambling 

Winnings 

Go 

'  Losses 

Gr 

Capital  Transactions 

Long  Term 

Gains 

Co 

Losses 

Ci 

Carryover 

C2 

Short  Term 

Gains 

C3 

Losses 

C4 

Carryover 

C5 

Deductions 


Health 

Drugs  and  Medicine  Do 

Hospitalization  Dj. 

Health  Insurance  Efe 

I>3 

D4 


Casualty  Loss 
Charitable 


Real  and  Personal  To 

State  Incane  T* 

State  Gas  Ts 

Federal  Gas  T3 

Alimony  Paid  Aa 

Employee  Business  Expense  E 

Federal  and  state  tax  are  computed  on  a  progressive  scale.  Higher 
brackets  of  marginal  income  are  taxed  at  higher  rates.  The  table 
below  shows  the  marginal  rate  at  which  income  earned  in  that  bracket 
is  taxed: 


Income 


FIRST 

$20,000 

NEXT 

$ 13,000 

NEXT 

$10,000 

NEXT 

$5,000 

ANY  AMOUNT 
OVER  $50,000 


Federal 

Marginal 

Rate 

State 

Marginal 

Rate 

10$ 

2$ 

20$ 

4$ 

50$ 

7$ 

40$ 

12$ 

50$ 

18$ 

221 


20  PI**  *15.33 
40  FOR  1*1  TO  S 
60  FOP  J«1  TO  3 
SO  READ  Xtl.Jl 
IOO  NEXT  J 
120  NEXT  I 

t«0  OATA  20000  *  •  I  *  .  02 
160  OATA  35000. *.2*. U« 

ISO  OATA  45000.* .3. .07 
200  OATA  50000. , .4 . . I  2 
2  20  OATA  55000.*. S«  .  t 

2*0  READ  W0*«l*A0»O»Y0» VI *Y2,G0.G1 *  CO » C 1 » C2 * C 3  * C a . C5 
260  READ  00 «0 I *02*0 3*04  *  TO* T I « T2.T3* A  1 »fc 
280  00*0 
300  01*0 

320  50*0 

340  S1*0 

360  S0*A04Y0 

380  S1*A04Y0 

400  S1*S14P 

420  IF  Yl<100  THEN  480 
440  S0*S04Y1-I00 

460  S1*S14Y1-100 

4S0  IF  Gl>GO  THEN  540 
500  S0*S04G0-G1 

520  51*51400-01 

540  if  co<ci+c2  then  600 
560  S0*S0+ ( CO— Cl— C 2 1/2 

580  51*S14(C0— Cl— C21/2 

600  IF  C3>C44CS  THEN  660 
620  S0-S0+IC3-C4-C51/2 

640  S 1 *S l ♦ 1 C 3— C4— C5 1/2 
660  «E*  COMPUTE  OFFSETS 

680  IF  C14C2>C0  THEN  620 
700  IF  CI4C2— C0>3000  THEN  780 
720  00*0040  4C2— CO 

740  O I *0 I 4C 1 4C2— CO 
760  GOTO  820 
780  00*0043000 

800  01*0143000 

620  IF  O 1 >2000  THEN  880 
640  Q*Ot 
660  GOTO  000 
680  0*2000 

900  IF  O0<«005* 1 V04Y 14 Y2 >  THEN  940 
920  0-Q4D0-, 005*1 Y04V14Y2 > 

940  0*0402/2 

960  IF  Q<»03*IY04V14Y2 I  THEN  J020 
980  00*0040-. 034IY04Y14Y2 1 

1000  01*0140-.  03*1Y04Y1*Y21 

1020  01*01403 

1040  00*00403 

1060  IF  04 > .25*1 Y04Y14Y21  THEN  1140 

1060  01*01404 

1100  00*00404 

1120  GOTO  1180 

1140  01*014.25*1 Y04Y14Y2) 

1160  00*004.25*1 Y04Y14Y2) 

1180  O 1 *0 1 4T  0 
1200  00«004TO 

1220  0 1 *01 4  T 1 

1240  0 1 *0 1 4T  2 

1260  00«D04T3 

1280  0  I *0 1 4  T  3 

1300  00*004T3 

1320  0I»014A1 

1340  00*004A1 

Figure  C.l.  Experimental  program  IIAX00 


222 


1360  0i*01+e 

1380  oo*oo*e 

*♦00  UO*SO-00 

1*20  Ul*Sl-01 

1**0  P1*0 

1*60  POP  1*1  TO  S 

1*80  IP  U0>XCI»1]  then  15*0 

1500  Pl-Pl  +  IUO-XC 1  *  I  1 > ** t I «2  1 

1520  GOTO  tb*0 

15*0  PI«P»*Xt I • I J*XC I »2 J 

1560  GOTO  1620 

15*0  NEXT  I 

1600  P1*P1*IU0-X(5« 1 3 >*X15»2  1 

1620  P0»0 

16*0  P0*0 

1660  POP  1*1  TO  5 

1680  IP  U1»XCI«1J  THEN  17«0 

1700  P0*P0>MU1-XC  1  »  1  1  >  *XC  I  t  33 

1720  GOTO  1800 

1740  P0*P0*XC l» 1 3*X£ 1 i33 

1760  NEXT  1 

1780  P0*P0+IUl-XC5» 1 ) >*XC5t3J 


1800 

PRINT 

-90 

state 

1820 

PRINT 

18*0 

PRINT 

"  INCOME  SOURCES*"! 

I860 

PRINT 

using  iaeo;si 

1880 

IMAGE 

0 • DDODDDOD 

1900 

PRINT 

*  *« ; 

1920 

PR  INT 

USING  1880ISO 

19*0 

PRINT 

■  n 

1960 

PRINT 

*  INCOME  OFFSE 

TS*"1 

1980 

PRINT 

USING  1880101 

2000 

PRINT 

■  '  »"} 

2020 

PRINT 

USING  1880100 

20*0 

PRINT 

ft  ■  M 

2060 

PRINT 

»  net  taxable 

•  "  ! 

2080 

PRINT 

USING  IBSOiUl 

2100 

PRINT 

M  **  ( 

2120 

PRINT 

USING  18801U0 

21*0 

PRINT 

M  •» 

2160 

PRINT 

«  INCOME  TAX 

*- l 

2180 

PRINT 

USING  18801P0 

2200 

PRINT 

«  • 

2220 

PRINT 

USING  18801P1 

22*0 

PRINT 

w  m M 

2260 

PRINT 

"  withholding 

m"  1 

2280 

PRINT 

USING  18805W1 

2300 

PRINT 

»  arm  ; 

2320 

PRINT 

USING  l«e0!»0 

23*0 

PRINT 

w 

2360 

P0«P0‘ 

-«  1 

2380 

P1*P1 

-wo 

2*00 

PRINT 

•*  tax  oue 

■  1 

2*20 

PRINT 

USING  1880IP0 

2**0 

PR  1  NT 

M  $ 

2*60 

PRINT 

USING  1880;ci 

2*80 

PRINT 

It  *  *1 

FEDERAL** 


Figure  C.l. — Continued 


LNPR 


.223 


Program  Overvlev 

LNPR  provides  solutions  to  linear  programming  problems  by  use  of  the 
simplex  solution  technique.  Linear  programming  problems  are  optimiza¬ 
tion  problems  in  which  a  linear  objective  function  of  the  form 


+  •• 


+  c  x 
n  n 


i6  max-1  ml  zed  or  minimized  subject  to  a  set  of  constraints  of  the  form 


allxl  +  a12x2+  *•*  +alnXnj;]bl 
a21xl  +  a22X2+  “*  +  a2nXn|=jb2 

Wl  +  V*2  *  *  amnx„{  =1  V 


x  ^  0  for  all  J  . 

J 

LNPR  provides  the  optimal  values  for  the  decision  variables 
x^  for  maximizing  the  value  of  Z. 

Program  Inputs 

Inputs  to  LNPR  include  the  number  of  decision  variables  (n);  the  number 
of  "less  than"  constraints  (i);  the  number  of  equality  constraints  (e);  j 
the  number  of  "greater  than"  constraints  (g);  the  coefficients  of  the 
objective  function  the  coefficients  of  each  constraint, 

row  by  row  "right  hand  6ides"  (b^. 

Data  for  LNPR  should  begin  at  line  9900  in  the  following  order: 


22< 


9900  DATA  n,  l,  g,  e 

991°  DATA  Cp  c2, 

9920  DATA  an,  a^,  •••  ,  a^,  b_L 

9950  DATA  a£1,  a22,  •••  ,  a2n,  bg 


DATA  Bml,  a^g , 


,  a  ,  b 
*  mn  m 


1 835355251 


Outputs 


LNPR  outputs  the  optimal  values  of  the  decision  variables 
r,A,  •••  ,  x^;  the  optimal  value  for  the  objective  function  Z;  and  the 
array  containing  coefficients  and  "right  hand  side"  values  for  all 
constraints  and  objective  functions. 

The  LP  problem  is  solved  by  either  the  dual  simplex  algorithm  or  the 
primal  simplex  algorithm.  In  the  former  case,  LNPR  prints  out 

DUAL  SIMPLEX  ALGORITHM  TO  BE  USED 

In  the  latter  case,  . 

PRIMAL  SIMPLEX  ALGORITHM  TO  BE  USED 

is  reported,  and  the  current  value  of  the  LP  tableau  is  printed,  in 
either  case. 

In  the  case  that  any  solution  is  impossible  vhen  subject  to  the  input 
constraints,  LNPR  prints  out: 

PROBLEM  INFEASIBLE 

In  the  case  that  a  solution  is  permissible  with  a  variable  unbounded 
value,  LNPR  prints  out: 

PROBLEM  UNBOUNDED 

LNPR  outputs,  at  every  iteration  of  the  solution  algorithm,  the  values 
of  the  matrix  -which  contain  the  data  for  the  linear  programming  problem. 
This  matrix  is  called  the  linear  programming  (LP)  tableau  and  is  com¬ 
posed  initially  of 


•  the  values  of  the  coefficients  of  the  constraints, 

•  the  values  of  the  constraint  right  hand  sides 

•  the  values  of  the  coefficients  of  the  objective 
function,  or  their  negatives 

as  shown  in  Figure  C.l. 

If  the  LP  problem  can  be  solved,  LNPR  prints  out  the  values  of  the 
decision  variables  (x^,Xg, • ••,xq)  which  optimize  the  problem,  and 
reports  the, constraints  which  are  binding  (exactly  satisfied). 


Program  Processing 

Contained  within  LNPR  are  two  algorithms  which  the  program  can  use  to 

solve  the  LP  problem:  the  dual  simplex  method  and  the  primal  simplex 

method.  The  dual  simplex  method  is  chosen  by  LNPR  whenever  all  of  the 

original  objective  function  coefficients  (cp>c2> *  * *>cn)  are  nonpositive 

and  is  employed  because  it  generates  fewer  artificial  variables  than 

the  primal  algorithm.  In  all  other  cases  (c.  >  O  for  some  j),  the 

J 

primal  simplex  method  is  used. 

Whereas  the  dual  simplex  algorithm  can  only  be  used  on  problems  with 
nonpositive  objective  function  coefficients,  the  primal  algorithm  can 
only  handle  constraints  with  nonnegative  right  hand  sides.  Hence,  the 
first  step  performed  by  LNPR  is  to 

.  determine  if  the  dual,  simplex  algorithm  can  be  used 

(all  cj  <  0,  for  all  j);  if  so,  all  less-than  constraints 
are  left  as  they  were  input,  and  all  greater-than  con¬ 
straints  are  converted  to  less-than  constraints  by  multi¬ 
plying  the  coefficients  (aii,ai2, • • *>ain)  and  right  hand 
side  (b^)  by  minus  one  (-1). 

•  if  the  primal  simplex  algorithm  must  be  used  (at  least 
one,  cj  >  0)  then  all  constraints  with  negative  right 
hand  sides  (b^  <  0)  are  multiplied  through  by  minus 
one  (-1),  turning  less-than  constraints  into  greater- 
thans,  and  vice  versa. 

The  last  preliminary  step  is  to  add  slack  (Sj),  surplus  (Tj),  and 
artificial  (R,  and  R^)  variables  to  the  cor.sti-aints. 


227 


A  matrix  A  is  used  to  store  the  coefficients  and  right  hand  sides  at 
any  6tage  of  the  optimization  process.  Each  step  of  each  simplex 
algorithm  transforms  A  until  a  stopping  condition  is  net.  Each  trans¬ 
formed  constraint  and  objective  function  is  expressed  as  an  equation 
and  occupies  one  row  of  the  A  matrix. 

Inequalities  of  the  form 


Vl  4  a12x£  4  •••  +  aln*n  <  bi 


are  transformed  into  the  equations  of  the  form 


*U*1  4  4  •"  4  4  SJ  '  bi  ' 

Equalities  of  the  form 

‘ll*!  4  4  "•  4  V.  -  bl 

are  transformed  into  .the  equations  of  the  form 

ailxl  4  a12x2  4  "•  4  ainxn  4  V  "i  ' 

Inequalities  of  the  form 

°llxl  4  a12x2  4  •"  4  ainxn  >  bi 
are  transformed  into  the  equations  of  the  form 

‘11*1  4  ai2x2  4  •"  4  ainxn  4  W  *1  ' 

Each  equation  above,  in  addition  to  the  objective  function,  Z  =  cxxA  +  i 

•••  +  c^x^,  is  represented  re v  by  row  in  A  by  storing  the  coefficients  { 

for  the  variables  x^,  ,  •••  ,  ;  Tp  •••  ,  Tg;  R^,  •*•  ,  ! 

R1;  R-i,  •••  ,  R  .  The  final  ccl.nn  of  the  A  matrix  is  used  to  store 
e  - 


the  right  hand  sides  of  the  equations.  A  diagram  of  A  is  shown  in 
Figure  C.l. 

The  primal  simplex  method  for  solving  a  linear  programming  problem  is  a 
cyclic  procedure  for  choosing  n  of  the  variables  and  assigning  (or  com¬ 
puting)  a  value  for  each  so  that  the  problem  constraints  remain  satis¬ 
fied.  Those  variables  so  chosen  are  termed  basic  variables,  and  their 
values  are  equal  to  the  current  right  hand  sides  of  the  n  constraints 
found  in  the  last  column  of  the  A  matrix. 

Each  cycle  of  the  simplex  method  requires  4  steps.  First,  the  current 
solution  must  be  tested  to  6ee  if  it  is  optimal.  Second,  a  new  vari¬ 
able  must  be  picked  to  be  basic.  Third,  a  currently-basic  variable 
must  be  chosen  to  be  excluded  from  the  set  of  basic  variables.  And 
fourth,  the  matrix  must  be  transformed  so  that  the  last  column  reflects 
the  values  of  the  currently  basic  variables.  The  procedure  for  each 
step  is: 

I.  Optimality  Test:  The  current  objective  function  co¬ 
efficients  equal,  for  any  non-basic  variable,  the 
marginal,  amount  by  which  the  objective  function  will 
increase  if  a  variable  is  picked  for  inclusion  in 
the  next  set  of  basic  variables.  If  all  objective 
function  coefficients  are  negative,  then  the  current 
solution  is  optimal.  If  one  or  more  non-basic  vari¬ 
able's  coefficient  in  the  objective  function  is  posi¬ 
tive,  then  .... 

II.  Choice  of  Entering  Variable:  Choose  the  variable 

whose  objective  function  coefficient  is  most  negative, 
say  the  variable  whose  coefficients  are  found  in  the 
ktb  column. 

III.  Choice  of  Exiting  Variable:  If  a  coefficient,  aik, 
in  column  k  is  positive,  then  any  increase  in  the 
entering  variable's  value  will  decrease  the  value 
of  the  basic  variable  associated  with  row  i.  One 
wishes  to  increase  the  entering  variable's  value 
to  the  point  where  one  currently-basic  variable's 
value  is  reduced  to  zero. 

For  every  positive  entry  a^  in  column  k,  compute 
the  ratio  of  a-j^  to  the  right  hand  side  b^.  Choose 
the  smallest  ratio,  say  ark/br,  occurring  in  row  r. 

Row  r  is  the  pivot  row.  If  every  entry  in  column  k 


is  zero  or  negative,  then  the  linear  programming 
problem  is  unbounded. 


229 


IV.  Pivoting:  For  the  A  matrix  to  properly  reflect  the 
coefficients  and  right  hand  side  values  in  terms  of 
the  new  basic  variables,  it  musx.  be  transformed  by 
"pivoting"  on  a^.  Mechanically,  this  translates 
into  performing  elementary  row  operations  on  matrix 
A  until  the  values  of  the  coefficients  in  the  ktb 
row  are  zero  for  every  row  except  row  R,  where  ar^ 
equals  1.  The  elementary  row  operations  that  are 
applied  are : 

•  every  element  in  the  r^^1  row  is  divided 
by  ark. 

•  from  every  row  i  in  A,  except  the  r^  is 
subtracted  the  r^  row  multiplied  by  a^. 

That  is,  for  every  row  element  a^j,  compute 

a.  .  *■  a.  .  -  a  .a.,  . 
ij  ij  rj  ik 

Included  in  this  pivoting  is  every  row  in  A 
except  the  ftb. 

The  dual  simplex  method  for  solving  a  linear  programming  problem  is 
also  a  cyclic  procedure  which  is  the  "mirror  image"  of  the  primal 
simplex  method.  The  "goal"  of  the  dual  simplex  routine  is  to  transform 
the  constraint  ri(£it  hand  sides  (b^'s)  until  all  are  positive.  The 
steps  involved  are: 

I.  Feasibility  Test:  If  all  the  right  hand  sides  (b^’s) 
are  positive,  an  optimal  solution  has  been  reached; 
otherwise  .... 

II.  Choice  of  Exiting  Variable :  Choose  the  most  negative 
right  hand  side,  found  in,  say,  row  r.  Row  r  will  be 
the  pivot  row. 

III.  Choice  of  Entering  Variable:  For  every  negative  co- 
efficient  (say,  a^i)  in  row  r,  compute  the  ratio  of 
the  current  objective  function  coefficient  cj  to  arj. 
Compute  a  ratio  for  each  negative  entry  in  row  r,  and 
choose  the  smallest  ratio,  say  Column  k  will 

be  the  pivot  column.  If  all  the  entries  in  row  r  are 
positive,  the  linear  programming  problem  is  infeasible. 

IV.  Pivoting:  Pivot  on  ar^  as  described  above  in  the 
primal  simplex  algorithm. 


2J0 


20  DIM  AC 1O.20J.OC 10.201 .Ft 101.NC201.PC 10J 
40  DIM  XC20J 
60  FOP  1st  TO  10 
•0  FOP  J*1  TO  20 
100  AC  I »  J  1»0 
120  NEXT  J 
140  NEXT  I 
160  80*0 
tao  no»o 

200  PE  AO  N.L.6.E 

220  S*N 

240  M*L+E*G 

260  FOR  1*1  TO  M+l 

260  FOR  J*1  TO  N-fl 

300  OCI.Jl-O 

320  NEXT  J 

340  NEXT  1 

360  L0*N+1 

360  EO-O 

400  60*0 

420  FOR  J*1  TO  N 

440  RE AO  A[Mf|.J] 

460  OCM*l*J]aACP+ I •  J  1 

480  NEXT  J 

500  FOR  J*1  TO  N 

520  IF  A  C  M4  1 »  J  3 > 0  THEN  620 

540  NEXT  J 

560  4EM  «*  OUAL  SIMPLEX  CAN  POSSIBLE  BE  USED 

580  0-1 

•00  GOTO  660 

•20  REM  •*  2  PHASE  PRIMAL  SIMPLEX  MUST  BE  USED 

•40  0*0 

•60  REM 

•80  REM  READ  IN  LESS  THAN  CONSTRANTS 
700  FOR  1*1  TO  L 
720  FOR  J*1  TO  N 
740  READ  AC  I <  J ) 

760  QCI.JJaACI.JJ 
780  NEXT  J 
800  READ  R  t  I  J 
•  20  OC  I.NEI  JaPC  I  3 

840  IF  D*|  OR  RCIJ  >*  0  THEN  980 

860  FOR  J*1  TO  N 

880  AC (.JJa-Atl. JJ 

900  NEXT  J 

920  RCtJa-Rtl] 

940  60SUB  5400 
960  GOTO  1000 
980  GOSUB  5120 
1000  NEXT  I 

1020  REM  READ  IN  GREATER  THAN  CONSTRAINTS 

1040  FOR  IaL+E+1  TO  L*E*G 

1060  FOR  Ja 1  TO  N 

1080  READ  A  C I • J 1 

1100  OC I » J1*AC I  * J3 

1120  NEXT  J 

1140  READ  PCI) 

1160  OC I »Nf I JaPC 1 3 

1180  IF  0*0  OR  PCI]  >»  0  THEN  1320 


Figure  C.3*  Experimental  program  LNFR00 


1200 
1220 
1240 
1260 
1280 
1300 
1320 
1340 
1360 
1380 
1400 
1420 
1440 
1460 
1480 
1500 
1520 
1540 
1560 
1580 
1600 
1620 
1640 
1660 
1700 
1720 
1740 
1760 
1780 
1000 
1020 
1040 
f  060 
1000 
1900 
1920 
1940 
I960 
1980 
2000 
2020 
2040 
2060 
2080 
2100 
2120 
2140 
2160 
2160 
2200 
2220 
2240 
2260 
2280 
2300 
2320 
!  2340 

I  2360 
2380 
i  2400 


FOP  J-l  TO  N 
AC  I»  JJa-AC  I,  J3 
NEXT  J 
PC  I Ja-RC 1 1 
GOSUB  5120 
GOTO  1340 
GOSUB  5400 
NEXT  1 

PE  **  RE  40  IN  F0UAL1TY  CONSTRAINTS 

FOR  I-L*t  TO  L*E 

FOR  J-l  TO  N 

READ  A ( 1 • J  J 

OC I t J J-At I* J3 

NEXT  J 

READ  PC  I  1 

OC I «N+1 Jane  1 J 

IF  0*1  OP  PC  1 3  >■  0  Then  1620 

FOP  3-1  TO  N 

AC  1  * J Ja-AC 1 1 33 

NEXT  J 

PC  I Ja-Rt 1 3 

GOSUB  5260 

NEXT  1 

FOP  3-1  TO  N 
NC JJaJ 
NEXT  J 
FOP  1-1  TO  N 
ACf *S+1 J-RCIJ 
NEXT  1 

FOP  1-1  TO  N 

IF  AC  1 «S4 1 J<0  THEN  1080 

NEXT  I 

GOTO  2160 

RE-  PHASE  II  ACHIEVE  PRIMAL  FEASIBILITY 
print  -dual  simplex  algorithm  to  be  used" 

FOR  Oal  TO  1000 

GOSUB  4660 

GOSUB  3660 

IF  P9«-l  THEN  2100 

GOSUB  3920 

IF  C9a.l  THEN  2100 

PRINT  "PROBLEM  INFEASIBLE" 

GOSUB  4660 
STOP 

GOSUB  4220 
NEXT  O 

AC»*4  I  »S*1  3*-ACM»i  *S*  1  J 

REM  PHASE  1H  ACHIEVE  PRIMAL  OPTIMALITY 
FOP  Ja 1  TO  S*I 

A  C  M  ♦  1  »  J  J  — ACM*  1  ,  JJ 
NEXT  J 

PRINT  "PRIMAL  SIMPLEX  ALGORITHM  TO  BE  USED 

FOR  0-1  TO  1000 

GOSUB  4660 

GOSUB  3100 

IF  C9»-l  THEN  248ti 

GOSUB  33H0 

IF  R94-1  THEN  2440 

PRINT  "PROBLEM  UNBOUNDED" 

GOSUB  4660 


Figure  C.J* — Continued 


252 


2420  STOP 

2440  GQSUB  4220 

2460  NEXT  0 

2480  TOO  J» I  TO  N 

2500  IF  6CI3<0  THEN  2560 

2520  NEXT  J 

2540  60 T O  2600 

2560  PRINT  "PBOSLEK  I  UFE  AS  1  tLE  •• 

2560  STOP 

2600  PRINT  "OP?I“*L  OBJECTIVE  FUNCTION  VALUE  IS  " 5 AC M4 | » S* 1 
2620  PRINT 

2640  PRINT  "VALUES  OF  DECISION  VARIABLES" 

2660  PRINT 

2680  PRINT 

2700  FOR  I«1  TO  M 

2720  IF  8 C I  7 >N  THEN  2780 

2740  XCBC131«A(1'S+13 

2760  PRINT  "X«»BCI3*"  *  "«A(I«S4IJ 

2780  NEXT  1 

2800  FOR  jsl  TO  S-M 

2820  IF  ABSINCJION  THEN  2660 

2840  X(NCJ]]«0 

2860  PRINT  "X"INIJH"  «  O" 

2880  NEXT  J 
2R00  PRINT 
2920  FOR  !«l  TO  M 
2960  FOR  J-I  TO  N 
2980  V"V  +  OI I  *  J  J  *  X  I J] 

3000  NEXT  J 

3020  IF  V#OC 1 *N4I 3  THEN  3060 
3040  PRINT  "CONSTHAINT-I-BINOINO" 

3060  NEXT  I 
3080  STOP 

3100  REN  «*«»MM*«M*M»**MM»*M*»M***»**M 
3120  REM  4  PRIMAL  OPTIMALITY  TEST:  6 

3140  PEM  4  DETERMINE  ENTERIN6  VARIABLE  * 

3160  REM  **«***«M««*tM*«M*M«****M»MMMM» 

3180  C9»-I 

3200  V9»0 

3220  FOR  Jl»l  TO  S-m 
3240  JmABSINCJI 3 > 

3260  V«A  C  M+ I • J  J 

3280  IF  V  >«  V 9  Then  3340 

3300  Y9«V 

3320  C9-J 

3340  NEXT  jl 

3360  RETURN 

3380  PEM  *****••*••*******»•*•*******«*******♦***** 

3400  P£M  4  PRIMAL  UNHUUNDEOKESS  TEST:  * 

3420  REM  4  DETERMINE  EXITING  VARIABLE  4 

3440  REM  4444444 44444444  t  444  *  444 4A444A4* *  444  44 *4*44 

3460  P9«» 1 

3480  V9»I.E438 

3500  FOR  I ■ 1  TO  m 

3520  IF  A t 1 > C9 1  <■  0  Then  3120 

3540  V»Al I .S4I 3/At  I »C9J 

3560  IF  V> Y9  THEN  J 1 20 

3580  R9»l 

3600  Y9»V 

3620  NEXT  I 


Figure  C.5.— Continued 


253 


3b*  o  pe  turn 

3660  BEM  **•*•**••***•***•*••••*****•**••***•*•*** 

3680  BE*  *  DUAL  EXITING  S  * 

3700  RE*  *  TEST  POJMAL  FEASIBILITY  * 

3720  BE*  **tM«tt**»l*»»*«ll»IM*M»*»*»*»*»«***M 

3740  B9«-I 

3760  V9a0 

3760  FOB  I»1  TO  M 

3800  V«Al  I »  S ♦ 1  J 

3820  IF  V>T9  THEN  3880 

3840  Y9«V 

3860  B9» I 

3880  NEXT  1 

3900  RETURN 

3920  PE*  ♦*«*»**«»*i»*i»m«**m**»»*«**m*m**»» 
3940  REM  *  DUAL  ENTERING  :  * 

3960  RE*  *  TEST  FOR  PRIMAL  FEASIBILITY  » 

3980  RE*  *«M***«»*M«t«*«#»**»****»»*»***»*M*» 
4000  C9»« 1 

4020  Y9«0 

4040  FOR  Jl*l  TO  S-* 

4060  JaABSINCJlII 

4080  IF  A  C R9 • J }  »■  0  THEN  4180 

4100  V*Af)S(  At*41  •  J3/A£R9t  J3  » 

4120  IF  V  >■  V9  THEN  4180 

4140  C9»  J 

4160  T9»V 

4180  NEXT  J1 

4200  RETURN 

4220  RE*  «*«»**«**«»♦****»♦»*•»****»***»* 

4240  REM  4  PIVOT  ON  A(R9«C9>  * 

4260  RE*  *«*«I***»**M»M*«MM«***»»»»»» 

4300  FOR  Jal  TO  S*1 

4320  AtR9« J]*ACR9t JI/At«9»C93 

4340  NEXT  J 

4360  FOP  I«1  TO  *41 

4380  IF  I«P9  THEN  4480 

4420  FOP  J«1  TO  S+l 

4440  ACl«J]-AtItJ]-A(B9*J3*Atl«C9] 

4460  NEXT  J 
4480  NEXT  1 
4500  FOP  Jl«l  TO  S-* 

4520  IF  N t  J 1 J «C 9  THEN  *560 
4540  NEXT  J1 

4560  PE*  EXCHANGE  INDICES 
4580  TaNtJIT 
4600  NCJl)aBCR9) 

*620  8  C  R 9  1  ■ T 

4640  RETURN 

*660  RE*  *•*«******•***•*••«**»»•*»• 

4680  RE**  PRINT  TABLEAU  * 

4700  RE*  «***M*M»»**M»**M»M»>»I 

4720  PRINT  "  ”1 

4740  FOR  J»!  TO  S 

4760  PRINT  «x"t JJ 

4780  NEXT  J 

4800  PRINT 

4820  FOB  I  a  I  TO  **1 

48*0  PRINT  "«*» 

*860  IF  I  a m ♦ i  THEN  4920 


Figure  C,3* — Continued 


*800  PRINT  ABS (6(13)1 

4900  GOTO  4940 

4920  PRINT  H  0  "* 

4940  FOR  J»1  TO  S*1 

4960  PRINT  USING  498054CI.J3 

4980  IMAGE  4*0000.00 

5000  NEXT  J 

5020  PRINT 

5040  NEXT  I 

5060  PRINT 

5080  PRINT 

5100  PET  URN 

5120  REM  •*  HANDLE  LESS  THANS 

5140  LO*>LO+ 1 

5160  ACI*L0]«1 

5180  BO-BOAl 

5200  BCBO l*LO 

5220  S«S*1 

5240  RETURN 

5260  REM  *«  HANDLE  EOUALITIES 
5280  EOmEOAI 
5300  AC  I *N+G  +  LAGOTEO J« 1 
5320  80«B0A 1 

5340  BCBO  J«« I N+G+L4G0TE0 I 
5360  S«S+1 

5380  RETURN 

5400  REM  •«  HANDLE  GREATER  THANS 

5420  GObGO+1 

5440  AC I.NAL4G-G04I 3»-I 

5*60  ACI»NAL4GAG0)m1 

5480  NObNO+1 

5500  NCN01«NTL+G—G0+ 1 

5520  BOaROtl 

5540  8CB0}a-><N  +  L  +  GTG0) 

5560  S«S4l 
5580  RETURN 
5590  END 


Figure  C.3. — Continued 


Program  Inputs  ^ 

Coefficients  of  P(z)  are  input  as  MTA  in  line  9900,  as 


9900  DATA  ao,  ai,  ag,  a3,  a4,  85  . 


Program  Outputs 

The  program  computes  the  four  local  optima  and  prints  the  answer  as 
LOCAL  OPTIMA  ARE  3.146  -25-6  39-128  5.543 


or  prints  the  message 


INPUT  POLYNOMIAL  DOES  NOT  HAVE  FOUR  LOCAL  OPTIMA 


Program  Processing 


The  roots  of  a  quart ic  equation 


F(z)  =  bo  +  b*z  +  tfez2  +  baz3  +  z4 


are  found  by  finding  a  root  y*  for  the  resolvent  cubic  equation 


where 


G(y)  =  y3  +  csy2  +  cjy  +  c0 


c2  =  -bs 

ci  =  (b3bi  -  4bo) 

Co  =  -ba^bo  +  4gbo  -  b^2  . 


The  roots  of  G(y)  are,  in  turn,  found  by  substituting  for  y  the  value 
x  -  C2/3  and  solving  for  the  roots  of 


[Where 


H(x)  =  x3  +  dix  +  do 

di  =  (5ci  -  c22)/3 
d0  =  (2ce3  -  9c2Ci  +  27c0)/27  . 


Let  S  =  d02/4  +  di3/2J.  If  F(z)  has  four  real  roots  then  S  <  0  and  a 
root  x*  of  H(x)  is  given  by 


239  ! 

i 


10  BEAD  AOiM «A2.A3*A4« as 
20  B0»A1/1S*A5> 

30  «  1  «2* A2/ < 5* A5 » 

•0  B2«J*A3/(S*A5) 

50  B3««*A4/( 5*A5 > 

bO  C2--H2 
70  C1«(B3*BI-«*H0) 

BO  CO*H3A2»BO***0?»hO“H 1 “2 

90  Dl*<3  **  Cl«C2*2>/3 

100  D0»« 2*C2“3-9*C2*C 1 +27»C0 ) /27 

110  S»D0“2YOl *3/27 

120  IF  S<0  THEM  1«0 

130  A«<-00/24SOR ( S > 1 *.3333 

140  B»(-O0/2-SQB(S) 1*.3333 

150  X  ■  A  +  B 

160  GOTO  210 

170  PRINT  "INPUT  POLYNOMIAL  OOES  NOT  HAVE  FOUR  LOCAL  OPTIMA 
175  GOTO  10 

180  LET  Tl«-00/(2*S0R<-01*3/27)  1 
1  BO  LET  T-3.14159/2-ATN1T1/SGR1  1-Tl*2)  ) 

200  LET  X*2*S0R < -P 1 /3*C0S < T/3 > ) 

210  LET  V«X-C2/3 
220  RlaB3*2/4-B2+Y 

230  IF  P1<0  THEN  170 
240  R-SQR(Rl) 

250  IF  P»0  THEN  330 

260  IF  V*2<4*B0  THEN  170 

270  LET  Pl«3*B3*2/4-2*02 

280  IF  P 1 ♦ P2<0  OR  Pl-P2<0  THEN  170 

290  LET  P2»2*S0P( Y*2-4*H0 1 

300  LET  D*SQR<  P1+P2 1 

310  LET  E»S0R1P1-P2> 

320  GOTO  420 
330  REM 

340  LET  01»3*B3*2/4-R*2-2*e2 

350  LET  02*( 4 * B2* B3- B *8 l -B 3“ 3 > / < 4*R  1 

360  REM  ADJUST  FOR  NEGATIVE  HIAS  OF  INTERNAL  FUNCTIONS 

370  Q*»Q 1 *02+ .00 1 

380  09«01-02+.00l 

390  IF  O6<0  OR  09 < 0  Then  170 

»00  LET  0»SQR<08) 

MO  LET  E»SQR<99> 

120  LET  Z l*-B3/4+R/2YD/2 
130  LET  Z2*-B3/4*R/2-D/2 
440  LET  Z3«-B3/4-D/2+E/2 

C50  LET  Z4»-B3/4-R/2-e/2 
60  PRINT  "LOCAL  OPTIMA  ARE  "  1  Z  1  1  Z 2  l  Z3 * Z • 

*70  STOP 


Figure  C.6.  Experimental  program  OFTM00 


240 


SC  NR 

Program  Description 

i 

This  program  reads  in  statements  written  in  a  fictitious  programming  j 
language  and  produces  a  stream  of  code  pairs  corresponding  to  the  sym-  j 
bols  which  make  up  the  program  read  in.  : 

The  fictitious  programming  language  which  is  accepted  by  SCNR  is  called 
EASY.  EASY  is  not  a  line-oriented  language;  two  or  more  statements  may  j 
be  placed  on  one  line  or  a  statement  may  be  continued  on  as  many  lines 
as  necessary.  Every  statement  is  ended  with  a  semicolon.  There  are 
ten  (10)  statement  types  in  EASY: 

<3rar>  :=  <expression> 

DIM  <yariable> ( <bound> ,  • *  *  ,  <bound> ) 

PRINT  <expression> ,  • • •  ,  <gxpression> 

I 

FOR  <rariable>  :=  <yariable  or  constant>TO  Variable  or  constant> 
WHILE  (<eondition» 

IF  (<condition»  THEN  Statement  > 

READ  <yariable> ,  * • •  ,  <^ariable> 

END 

GOTO  <Label> 
jother  rules  of  EASY  are 

|  .  Comments  can  be  placed  anywhere  in  an  EASY  program;  a  comment 

begins  with  the  symbol  "/*"  and  continues  until  the  <'*/"  is 
found  (e.g.,  "  /*  THIS  IS  A  COMMENT  */")• 

•  Every  EASY  statement  ends  with  a  semicolon  (e.g.,  "  X  :=  X  +  1;"). 

•  Constants  are  of  two  types 

-  numeric  constants  are  simple  integer  numerals  or  numerals 
with  decimal  fractions  (e.g.,  "52"  or  "47.8"). 

i  -  string  constants  are  character  strings  within  single 

quotes;  there  is  no  way  to  represent  a  single  quote 
within  a  string  constant  (e.g.,  "  'THIS  IS  A  STRING  '  "). 


2hl 


.  Statement  labels  may  precede  a  statement  and  consist  of  a 

proper  identifier  followed  by  a  colon  ( e.g .,  "  NEXT:  PRINT  A;"). 

•  Proper  identifiers  are  variable  names  and  statement  labels; 
an  identifier  must  be  one  to  thirty  characters  in  length, 
begin  with  one  of  the  characters  {A,  B,  • • •  ,  C,  #,  @,  $ J 
and  be  entirely  composed  of  the  characters  {A.  B,  • • •  ,  Z. 

#,  $,  @,  0,  ...  ,  9h 

•  Blanks  are  necessary  any  time  a  string  of  characters  would 
be  ambiguous.  Specifically,  to  separate 

-  a  keyword  or  variable  on  the  left  from  a  numeric 
constant  on  the  right  (e.g.,  "  PRINT  10;  "). 

-  a  keyword  from  a  variable  (e.g.,  'FOR  I  :=  1  TO  10  ;  "). 

Any  time  one  blank  is  used,  more  than  one  blank  may  be  used 
without  any  change  in  meaning.  This  is  not  true,  of  course, 
within  comments  or  string  constants. 

•  Six  built-in  functions  are  provided  in  the  EAST  language: 

ABS,  SQR,  SGN,  SIN,  COS,  AND  ATN;  each  one  takes  one 
argument  (e.g.,  ”  X  :=  SIN(  VO #)+  SIN(Y);  "). 

Each  distinct  symbol  in  the  language  is  assigned  a  symbol  class  and  a 
subclass  number.  These  assignments  are 


Symbol 

Class 

Number 

Subclasi 

Number 

PRINT 

1 

1 

READ 

1 

2 

DIM 

1 

5 

IF 

1 

k 

THEN 

1 

5 

WHILE 

1 

6 

END 

1 

7 

FOR 

1 

8 

TO 

1 

9 

GOTO 

1 

10 

ABS 

2 

1 

SQR 

2 

2 

SGN 

2 

3 

SIN 

2 

k 

Symbol 

COS 

ATN 


Class  Subclass 

Number  Number 

2  5 


2*0 


9900 

DATA  " 

DIM  A(100);  “ 

9910 

DATA  " 

LET  X  :=  X  +  1;  " 

9920 

DATA  " 

LET  A$  :=  'STRING 

9930 

DATA  " 

END;" 

Program  Outputs 


SCNR  outputs  all  of  the  information  necessary  to  reconstruct  the  input 
statements.  Specifically,  SCNR  outputs  a  code  pair  for  each  program 
symbol,  a  table  of  identifiers  found,  and  a  table  of  string  constants 
found.  Each  code  pair  consists  of  two  parts:  the  first  part  is  the 
class  number;  the  second  part  is  either  the  subclass  number  (for  key¬ 
words,  operators,  functions,  and  delimiters)  or  a  value  (for  numeric 
i  constants) . 

A  sample  of  the  output  expected  for  the  example  program  above  is: 


IDENTIFIER  INFORMATION 


NUMBER  OF  IDENTIFIERS  FOUND  =  4 
IDENTIFIER  TABLE 
1  1 

2  4 

5  5 

6  7 
IDENTIFIER  STRING 
ALETXAE 


STRING  CONSTANT  INFORMATION 

NUMBER  CiF  STRING  CONSTANTS  FOUND  =  1 
STRING  CONSTANT  TABLE 
1  6 

CONSTANT  STRING 
SIRING 

SCANNER  TOKEN  CODE  PAIRS 

1  3 

0  1 

7  ^ 

4  100 

7  5 

7  2 

6  2 

C  3 


24 k 


SCANNp  TOKEN  CODE  PAIRS  (contd.) 

3  14 

4  1 

7  2 

6  2 

6  4 

3  14 

5  1 

7  2 

1  7 

7  2 

SCNR  prints  out  an  error  message  and  stops  if  any  of  the  following 
limitations  are  violated: 

•  the  total  number  of  identifiers  must  not  exceed  20. 

•  the  total  number  of  string  constants  must  not  exceed  10. 

•  the  combined  length  of  all  string  constants  must  not  exceed  255. 

•  the  combined  length  of  all  identifiers  must  not  exceed  255. 

.  the  total  number  of  symbols  must  not  exceed  100. 

[Program  Processing 


SCNR  is  implemented  as  a  finite  state  automaton-  in  which  the  program 
[progresses  from  state  to  state  until  it  reaches  an  action  to  take. 
fThere  are  nine  states  corresponding  to  distinct  "situations"  in  the 
decoding  of  a  symbol: 

1 .  initial  state 

2.  currently  decoding  integer  or  real  numeric  constant 

3.  currently  decoding  real  numeric  constant 

4.  currently  decoding  keyword  or  identifier 

5.  currently  decoding  identifier 

6.  currently  decoding  *  or  ** 

7.  currently  decoding  j  ^  j  or  j  j 

8.  currently  decoding  :  or  := 

9.  currently  decoding  /  or  /* 

llbere  are  14  character  types  into  which  the  character  set  is  grouped. 
[They  are 


245 


1.  a  letter  {A,  B,  • • •  ,  Z) 

2.  a  digit  {0,  1,  •••  ,  9) 

5*  a  National  character  (#,  e,  $) 

4.  a  period 

5.  an  asterisk 

6.  an  equal  sign 

7.  an  inequality  sign 

8.  a  colon 

9*  a  plus  or  minus  sign  {+,-} 

10.  a  slash  C/3 

11.  a  delimiter  {,;()} 

12.  a  quotation  mark  ('} 

15 •  a  blank 

14.  a  fictitious  end-of -program  symbol  Cl-} 

SCNR  progresses  from  state  to  state  in  decoding  the  input.  A  state 
table  contains,  for  every  combination  of  current  state  (row)  and  next 
character  type  (column),  the  next  state  to  assume  or  action  to  take. 
There  are  eight  actions: 

ACTION  DESCRIPTION _ 

1  A  keyword  or  identifier  has  been  found;  determine  which 
one;  if  keyword,  produce  pair; 

2  A  quotation  mark  has  been  found;  read  incoming  charac¬ 
ters  until  another  quotation  mark  is  found;  6tore 
string  constant  in  string  table  and  produce  code  pair. 

J  An  operator  has  been  found;  determine  which  one; 

produce  code  pair. 

4  A  numeric  constant  has  been  found;  determine  which  one, 
and  produce  code  pair. 

5  End  of  program;  output  code  pairs  and  tables. 

6  An  identifier  has  been  found;  if  ’'NOT",  "AND",  or  "OR" 
proceed  to  action  3,  else  ensure  identifier  is  stored 
in  identifier  table  and  produce  code  pair. 

7  A  delimiter  has  been  found;  determine  which  one;  produce 
code  pair. 

8  A  comment-begin  (/*)  has  been  found;  read  and  ignore  all 
characters  until  comment-end  is  found  (*/)• 


2k6 

SCNR  has  been  Implemented  by  modularizing  the  program  parts  into 
routines.  One  routine  exists  for  each  of  the  eight  actions.  Four 
additional  routines  are 

•  GETLINE  -  gets  the  next  line  by  reading  the  next  DATA  state¬ 
ment;  prints  out  the  line  together  vith  a  line  number  and 
removes  all  but  one  leading  blank. 

•  GETCHAR  -  returns  the  next  significant  character  and  its  type; 
returns  only  the  first  in  a  string  of  blanks;  vhen  processing 
a  constant  string  returns  only  the  character  without  resetting 
character  type;  returns  a  type  of  14  if  end  of  program  is 
detected. 

.  I NIT  -  initializes  all  the  tables  used  in  the  program  as  well 
as  status  variables.  These  include 

-  Search  tables  K,  F,  0,  D,  I,  and  Q  used  to  search  for 
keywords,  functions,  operators,  delimiters,  identifiers, 
and  string  constants,  respectively.  These  search  tables 
are  used  in  conjunction  with  search  strings  K$,  F$,  0$, 

D$,  1$,  and  Q$,  where 

X(I,1)  and  X(l,2) 

contain  the  first  and  last  character  position  of  the 
entry  in  X$.  I  and  Q  are  updated  during  processing 
whereas  K,  F,  0,  and  D  are  static  in  terms  of  current 
active  size. 

-  Search  string  BS,  containing  the  legal  character  set; 
the  B  vector  is  used  for  determining  a  character  type; 
if  character  X  resides  at  position  I  in  BS  and  B(J  -  l) 

<  1  S  B(j),  then  X  is  in  class  J. 

The  state  table  S  described  above;  action  entries  are 
stored  as  negative  integers  and  next-state  numbers  as 
positive  integers. 

Status  Variables 

Q7  *  1  implies  processing  a  string  constant, 

0  otherwise 

F7  =  1  implies  at  least  one  blank  has  preceded  current 
char, 

0  otherwise 


to  DIM  AS  C 30 3 
20  GOSUB  2340 
30  LET 

40  GOSUB  2030 
50  FOP  Z8»l  TO  10000 
50  LET  L«1 

70  FOB  Z  7* 1  TO  100000. 

80  LET  C5aSCL«C3 
90  L«C5 

100  IF  C5  >*  1  Then  130 

110  GOTO  -C5  OF  1  80»  4  1  0. 680  i«00i  1  0  1  0  t  t  360*  1  560.  1650 

120  GOTO  160 

130  IF  CSa»  •*  THEN  150 

133  IF  LEN  (  AS  1  <30  ThfcN  1  <*  0 

134  E*6 

135  GOSUB  3070 

140  LET  ASCLEN(AS)+13bCS 
150  GOSUB  2030 
160  NEXT  Z  7 

170  REM  *********  ******  ******** 

180  PEM  *  KEVMQPO  OP  VARIABLE  * 

190  REM  A********************** 

200  FOB  Xml  TO  K9 

210  IF  ASaKSCK£X*l)*KtX*233  THEN  240 

220  NEXT  X 

230  GOTO  280 

240  C9*C941 

248  C9«C9+l 

250  CCC9* 13*1 

260  CCC9.2]*X 

270  GOTO  1790 

280  FOR  X»1  TO  F9 

290  IF  ASaFSCFCXt 1 3*FCX«2)  ]  THEN  320 

300  NEXT  X 

310  GOTO  360 

320  IF  C9<C6  THEN  328 

322  E>6 

324  GOSUB  3070 

328  C9«C  94 1 

330  CCC9'1)>2 

340  CCC9«2)>X 

350  GOTO  1790 

360  IF  AS*mAND"  THEN  680 

370  IF  AS»"OB"  THEN  bBO 

380  IF  AS«**N0T"  THEN  680 

390  GOTO  1360 

400  PEM  a************************** 

410  PEM  A  PROCESS  STRING  CONSTANT  * 

420  PEM  TURN  ON  QUOTE  MODE 

440  REM  *  A* *  **********************  * 

450  09*0941 

460  IF  09  <■  10  THEN  490 

470  E«2 

480  G0SU8  3070 

490  IF  Q9> 1  THEN  520 

500  OCltllMl 

510  GOTO  530 

520  Ot 09* 1 3»0t 09- I >2  3  4 1 

530  FOP  OBaQ  (  09  *  1  )  TU  100 


Figure  C.8.  Experimental  program  SCNR00 


5*0  GOSUB  20  30 

550  IF  CSa«»"  T  MtN  600 

555  IF  C*14  THEN  560 

556  E  »  8 

558  GOSUB  3070 
560  0SC083»CS 

570  NEXT  06 
580  E«3 

590  GOSUB  3070 
600  QC09»22«o  -1 

6*0  C9-C941 

650  C  £  C9  « 1 3*5 
660  C£C9.23»09 

670  GOTO  1790 

675  REM  ***•*«.*•*•****•**••• 

680  BEK  *  PROCESS  OPERATOR  » 

690  REM  •*••*•*****•*•***••• 

700  FOR  X«1  TO  09 

710  IF  ASaOS£0£X« 1 3iUCX.2 33  THEN  730 
720  NEXT  X 

730  IF  INT ( C5 ) «C5  THEN  760 
740  ASCLEN1 AS  1 ♦ 1 3«C* 

750  GOSUB  2030 

760  IF  C9<C6  THEN  768 

762  E*6 

764  GOSUB  3070 

768  C9-C941 

770  C£C9» 13*3 

780  CCC9.21«X 

790  GOTO  1790 

800  REM  *«***•***•*»**••«••**•*•**** 

010  OEM  «  PROCESS  NUMERIC  CONSTANT  • 
020  REM  •*******«***».•*«*•»*•*••*•» 
030  DIM  NS[ 103 
040  NSa-0123456789" 

050  V«0 

060  P7>LEN(ASI 

070  FOR  Xa 1  TO  LEN(AS) 

000  IF  ASCXtXl*".”  THEN  SIO 
090  LET  P7«X 
900  GOTO  950 
910  FOP  JaO  TO  9 

920  IF  NSC J4l»J41 3«AS[X»X3  THEN  940 
930  NEXT  j 
940  V*  V  * 1 0 ♦ J 
950  NEXT  X 

960  LET  VxV/ll  0“  1  LEM  A*  1-P74  1  1  » 

965  IF  C9<C6  THEN  970 

966  E«b 

967  GOSUB  3070 

970  C9*C«41 

900  C(C9« 1 3»4 

990  C( C9*  2  3  »V 
1000  GOTO  1790 
1010  REM 

1020  REM  • 

1030  BE**  *  PRINT  OUT  SCANNER  TABLES 
10*0  REM  • 

1050  REM  **♦•***»*«*«*«*♦•«•* *»»«***« 
1055  PRINT  L1N(?) 


Figure  C.8. — Continued 


250 


1  060 
I  070 
I  080 
1090 
1100 
1110 
1  120 
1130 
1  1*0 
i  1*50 
1  160 
1  170 
1180 
1  190 
1200 
1210 
1220 
1230 
12*0 
1290 
1260 
1270 
1280 
1290 
1300 
1310 
1320 
1330 
13*0 
1390 
1360 
1370 
1380 
1390 
1*00 
1*10 
1*20 
1*27 
1  *30 
1*31 
1*32 
1*39 
1**0 
1*90 
1*60 
1*70 
1*80 
1  *88 
l  *98 
1900 
1910 
1520 
1530 
1539 
15*0 
1550 
i  560 
157C 
1580 
1  585 


1  nFCC  T  1  ON" 


PRINT  "IDENTIFIER  InFCRMAT  ION" 

POINT  ••«*«***  £  C  S  3 

PRINT  "NUMPEB  OF  IDENTIFIERS  FOUND 
PRINT  "IDENTIFIER  TaPlE" 

FOR  X*1  TO  19 
PRINT  ICX*13*1CX»?3 
NEXT  X 

POINT  “IDENTIFIER  STRING" 

FOR  X  *  I  TO  LEM  I  *  )  STEP  60 
PRINT  !*tX»X459J 
NEXT  X 
PRINT 

PRINT  "STRING  CONSTANT  INFORMATION* 
PRINT  "**ixcc  ixxxxstx  ixxtaastste' 

POINT  "NUMBER  OF  STRING  CONSTANTS  I 

print  "String  constant  table" 


"STRING  CONSTANT  INFORM aT 1  ON" 


POINT  "NUMBER  OF  STRING  CONSTANTS  FOUND 
PRINT  "STRING  CONSTANT  TABLE" 

FOR  X*1  TO  09 
PRINT  OCX* I J*UCX»23 
NEXT  X 

PRINT  "CONSTANT  STRING" 

FOR  X  a  1  TO  LEN(OT)  STEP  60 
PRINT  QSt  X  »  X*59  3 
NEXT  X 
PRINT 

PRINT  "SCANNER  TOKEN  CODE  PA 

PRINT  "aaaaaaa  ut*i  aaaa  as 
FOR  X  a  1  TO  C9 
PRINT  CCX,1 3«CCX*23 
NEXT  X 
STOP 

REM  *  PROCESS  IOEnTTFIFR  * 

REM  »«*»****»****»****«*»» 

FOR  Xa|  TO  19 

IF  atallt ICX* 1  It  It  X  *  2  3  3  THEN 
NEXT  X 
I9«I9*t 
GOTO  1*40 
IF  1 9<20  THEN  1*39 
E«5 

GOTO  3070 

1 C I9« I !• I C 19- I *2 J* I 
lC19*2J«lt!9*13TLEMA*»"l 
IF  I C l 9 • 2  3  <■  255  Then  1*80 
E«* 

GOSUB  3070 
ISC  It  19* I  J  3 ■ A  t 
ISC  I t 19* 1  IJ*»S 
C9«C9+ 1 
CCC9, 1 3«6 
CCC9.23aI9 
RETURN 
C9aC9T 1 
Ct  X  « I  1a6 
CCX,23"X 
GOTO  1790 

R£M  **« fa  *****  «  ••*»«•«*** S  *»  * 

RE«  •  OELJMETER  PROCESSING  • 

RFM  *••*••••*«»*»•*•••*•«•»• 

IF  I  N  T  <  C5  I  *C.  *i  T  HE'I  :?NU 


PAIRS" 


1530 


Fifurc  C  .8.— Continued 


251  ! 


1586 
1  587 
1590 
1600 
1610 
1620 
1622 
1624 
1628 
1630 
1640 
1650 
1660 
1  670 
1680 
1690 
1700 
1710 
1720 
1730 
1  740 
1750 
1770 
1780 
1790 
1795 
1800 
1810 
1820 
1830 
1840 
1850 
I860 
1870 
1880 
1890 
1900 
1910 
1920 
1930 
1940 
1950 
1960 
1970 
1980 
1  990 
2000 
2010 
2020 
2030 
2040 
2050 
2060 
2070 
2080 
2090 
2100 
2110 
2120 
2130 


MlLENl  AS  »♦!  J*Ci 
GOSUB  2030 
FOR  X  *  1  TO  09 

IF  AS*OSt Dt * » 1  3 »OC X  i  2 J 3  THEN  1620 
NEXT  X 

IF  C9<C6  THEN  1628 
E  *6 

GOSUB  3070 
C9*C9* 1 
CtC9» 1 1*7 
CCC9.23** 

GOTO  1790 

PE"  *  COMMENT  PROCESSING  * 

DIM  ZSt 1 3 

GOSUB  2030 

FOB  K*1  TO  10000 

zs*es 

GOSUB  2030 

IF  ZS*"*"  THEN  1770 

IF  ZS*"/"  THEN  1760 

NEXT  X 

GOSUB  2030 

AS*"" 

NEXT  Z  8 

PEN*  END  OF  MAJOR  COOP* 


PEN 
REM  * 

REM  •  GET  THE  NEXT  CINE 

REM  4 
REM 

DIM  XSC  1  00  3  »  VSC  I  00  3 
IF  TYP(0>*3  THEN  1920 
Y9*-l 
RETURN 
XS*YS 
REAP  YS 
X  9*  X  94  1 

PRINT  T AB ( 2 > 1 X9 5 T AB < 5 > 1 Y* 

CET  V6*CEN!YS> 

FOP  Y  9*  Y  8  TO  1 
IF  YSCIyIJV"  "  THEN  2010 
YS*YSt  23 
NEXT  Y  9 

return 

RETURN 

REM  ********** * * i 
REM  * 

REM  4  GET  Th£  NEXT  CHARACTER 
REM  *  IN  CS  AND  RETURN 

PE“  »  CHAR  TYPE  IN  C 

REM  • 

PEM  »*******••••••**•*•«***•**• 

01M  CSC  I  3 

FOR  F  6  *  1  TO  1000 

GOSUB  2220 

IF  07*0  Then  2150 


* 

* 

* 


l 


Figure  C .8. —Continued 


2140 

2150 

2160 

2170 

2190 

2190 

2200 

2210 

2220 

2230 

2240 

2250 

22b0 

2270 

2280 

2290 

2300 

2310 

2320 

2330 

2340 

2350 

2360 

2370 

2380 

2390 

2400 

2410 

2420 

2430 

2440 

2450 

2460 

2470 

2480 

2500 

2510 

2511 

2512 

2513 

2514 
2520 
2530 
2540 
2550 
2560 

2570 

2571 

2572 

2573 

2574 
2580 
2590 
2600 
2610 
2620 

2630 

2631 

2632 

2633 


252 


RETURN 

I F  C*«M  "  THEN  2180 

F7»0 

RETURN 

IF  F  7  00  Them  2210 


F  7«  1 
RETURN 
NEXT  F  6 

IF  P 1 <  V  9  THEN  2230 
GOSUB  1830 

IF  Y  9  >=  0  THEN  2270 

C»  1  3 

RETURN 

LET  P1*0 

LET  P1*PM-1 

LET  C*«YStPI,R13 

FOR  I«1  TO  54 

IF  BS(I«I3«CS  THEN  2350 

NEXT  I 

E«l 

G0SU8  3070 

FOR  C*1  TO  13 

IF  I  <«  6  C  C  3  THEN  2380 

NEXT  C 

RETURN 

REM  ******************* 


REM  * 

REM  *  INITIALIZE  SEARCH  TABLES  AND  STRINGS 
REM  • 

DIM  K[ 15.23 «Ftl0«23*0C 20*23 *Dtl 0*23 tBC203 

DIM  KSC1003.FSC 1003. OSC 1003. D$t 1003.USC 1003 

F7«0 

Q7«0 

P1«0 

LET  K9-10 
FOR  J»1  TO  2 
FOP  t>l  TO  K9 
READ  KCI.JJ 
NEXT  I 
NEXT  J 

DATA  1 .6. 1 0. 1 3. 1 5. 1 9 • 24 .27 .30,32 
DATA  5. 9, 1 2 i 1 4,  1 8,23* 26 t 29 • 31 . 36 
READ  KS 

DATA  "PPlNTPEADDIMIFTHtNxHlLEFNPFORTOGOTn" 

LET  F9«6 

FOP  J»1  TO  2 

FOR  !»l  TO  F 9 

RE  AO  FCI.J3 

NEXT  T 

NEXT  J 


DATA  1,4*7,10*  13*  lb 
DATA  3*6*9*  12*15*19 
READ  Fi 

DATA  »A8SSQRSGNS  INCrSATf." 

LET  09*13 

FOR  J»1  TO  2 

FOR  I»1  TO  09 

READ  Ot  I  * J  I 

NEXT  I 


Figure  C.S. — Continued 


253 


26 3*  NE*T  J 

26*0  DAT*  I  .2. 3.A.9. T.9. 10. I l . 1 3. t 5.  t 8. 22 
2690  OAT*  I • 2 • 9* * • o • * « 0 • 1 0 • I  2 • 1  * « 1 7  *  2 1 1 2 3 
2660  REAO  O* 

2670  DATA  «♦■»/•••  t  ■<><*»«nOT  AK008" 

2660  LET  00*9 

2690  TOO  J«l  TO  2 

2691  *00  1*1  TO  09 

2692  REAP  OCI.J) 

2693  NE«T  | 

2 69*  NE«T  J 

2700  OAT*  I. 2, 3... 9 
2710  DATA  1.2.3.A.9 
2720  REAO  0* 

2730  0*TA  "I t.l »« 

27*0  REAO  0* 

2790  DATA  "AUCOEEGh I JKL-NODOOS TO»« * TZO I 23*967B9«0f . *■<> I ♦-/ i • (  )  •" 

2760  09* 1 3 

2770  * AT  REAO  RC691 

2700  DATA  26.3«>.39.*0|«  t  <*2 < •« . *  9 • *  7 • *6 « S3 . 9* > 99 
2790  RE*  *  INITIALIZE  SEaRC>« 

2000  BE*  •  STRINGS  ANO  a 

2010  BE*  •  TARGES  * 

2020  RE* 

2030  OI*  0110.21.IC20. 2] 

20*0  OI*  0»C 299 > • 1 AC259 ) 

2090  LET  09*0 
2060  LET  19*0 
2070  O**  — 

2000  I6*“- 

2090  RE*  *••••**•*•••**••••••*••**•*•*•*•* 

2900  RE*  •  INITIALIZE  CODE  VECTOR  ANO  • 

29io  re*  •  ano  state  table  • 

2920  RE*  •****#****A**A*«A9«*A*»»**»***»*» 

2930  OI*  CC 200*2 1 #SC9*1A) 

2939  LET  C6*200 

29*0  LET  C9*0 

2990  *AT  REAO  SCO. 14] 

2960  DATA  A . 2 » 5 • J . b • “3 . I . 7 . « . - 3 . I . 9 • -7 • 1 i“2 . 1 .“9 
2970  OAT*  -A, 2. -A. 3. •*,-*. -A. -A. -A, -A. -A, -A, -A. -A 
12900  DATA  -A, 3, -A. -A, -A. -A, -A, -A. -A. -A, -A, -A, -A, -A 
2990  OAT*  A. 9. 9.-1 .-1 .-I .-1 «•! .-I .-I ,-l .-l •-!»-» 

13000  OATA  9. 9.9  A— 6 . -6«— 6 «— 6 » “6 6 . -6. — 6*— 6  A  — 6 *-6 
.3010  OAT*  —  3  .  —  3  .  —  3  .  —  3 .  —  3  «  1  .  —  3  .  —  3  .  —  3  »  —  3  •  —  3*  —  3  •— 3 *  —  3  .—  3 
3020  OATA  — 3.-3 3. — J«— 3 .— 3. I .— 3 »— 3* - J*— 3a— 3 »— 3»— 3#— 3 
3030  OATA  — 7 « — 7 *— 7 • — 7 . — T . — 3 . 1 . — 7 . — T t — 7 » — 7 « — 7 *— T .— 7 . — 7 
30*0  OaTa  — 3 *  —  3 *  —  3 .  —  3 *  —  8 . 1 •  —  3 *— 3 .  —  3 *  —  3*  —  J *  —  3 •  —  3 t  —  J .  —  3 
3090  GOSuB  1030 
3060  RETURN 

3070  RE*  . . . . . 

3000  RE*  *  * 

3090  BE*  *  ERROR  DIAGNOSTIC  GENERATOR  • 

3100  RE*  *  « 

3110  RE*  •MIHMAllHtllMMlIKM.IIlllIK 

3120  OOINT  TAB101A7I I"*" 

3130  MINT 

31*0  PRINT  "EO900I  "I 

3190  GOTO  E  O*  3160.3140. 3200. 3220.32*0. 3260.3300 
3160  PRINT  -ILLEGAL  CHARACTER" 

3170  GOTO  1010 


Figure  C  ■  8.  —Continued 


3180  PW  1  NT  "TOTAL  NU"BER  OF  SIRING  CONSTANTS  V  I O" 

3190  GOTO  1010 

3200  PRINT  "TOTAL  LENGTH  OF  ALL  STRING  CONSTANTS  >  255" 
3210  GOTO  1010 

3220  PRINT  "TOTAL  LENGTH  OF  ALL  IDENTIFIERS  >  255" 

3230  GOTO  1010 

32A0  PRINT  "TOTAL  NUMBER  OF  IDENTIFIERS  >  20" 

3250  GOTO  1O10 

3260  POINT  "TOTAL  NUMBER  OF  SYMBOLS  >  200" 

3270  GOTO  1010 

3300  PRINT  "PROGRAM  gwnS  «ITh  OPEN  COMMENT  OH  OUOTF" 
3310  GOTO  1010 


Figure  C.8. — Continued 


APPENDIX  D 

TEE  CONSTRUCTION  OF  EXPERIMENTAL 


TEST  DATA  SETS 


White-box  and  black-box  test  data  sets  were  constructed  for  each  of  the 
four  experimental  programs.  A  black-box  test  data  set  was  developed  by! 
considering  what  different  processes  were  identified  by  a  program 
specification;  data  were  then  created  to  cause  one  class  of  program 
output  to  occur.  Within  a  black  box  data  sei  was  also  included  data 
combinations  to  test  the  assumptions  and  restrictions  cited  in  the 
specification. 

The  specific  development  process  for  each  program's  black-box  data 
sets  is  outlined  below: 

•  MAX  -  The  input  data  for  two  fictitious  individuals  with 
complementary  and  vastly  different  income  and  deduction 
items  were  used  to  construct  two  test  data  sets. 

•  LNPR  -  Six  characteristics  of  linear  programming  problems 
were  identified:  direction  of  constraint  (in) equality, 
unboundedness,  constraint  redundancy,  infeasibility, 
negative  right  hand  6ides,  and  negative  objective  function 
coefficients.  Six  combinations  of  these  attributes  were 
grouped  and  a  linear  programming  problem  was  constructed 
corresponding  to  each  of  these  combinations. 

•  OFTM  -  Five  test  data  sets  were  constructed  from  problems 
chosen  from  a  text  on  the  theory  of  equations  . 

•  SCNR  -  A  sample  program  was  developed  which  used  at  least 
one  member  of  each  symbol  class.  Within  the  program,  state¬ 
ments  were  included  to  determine  if  strings  and  comments 
were  properly  handled.  The  program  was  otherwise  written 

to  be  representative  of  a  "normal"  program. 

The  goal  of  black-box  data  construction  was  to  develop  test  sets  repre¬ 
sentative  of  typical  problems,  but  sufficiently  diverse  so  as  to  verify 
the  functions  and  restrictions  set  down  in  the  specifications.  The 

goal  of  white-box  data  construction  is  the  development  of  a  minimal 

I 

'set  of  test  cases  which  covers  the  program  graph  as  completely  as  pos¬ 
sible.  It  is  nearly  always  infeasible  to  force  the  execution  of  every 

i 

25C 


path  sequence,  but  often  possible  at  least  to  cover  every  program 
branch  and  important  combinations. 

The  procedure  for  white-box  data  set  construction  was  similar  for 
each  of  the  experimental  programs.  A  program  flowchart  was  developed 
and  test  data  sets  calculated  to  ensure  that  every  program  branch  was  J 
taken.  Then,  a  review  of  the  program  code  was  performed  to  determine  j 
paths  of  execution  which  were  related  through  the  use  of  the  same  j 

variable(s).  Data  sets  which  forced  execution  of  important  branch 
pairs  were  then  added.  The  resulting  test  sets  for  ITAX  and  OETM  ap¬ 
peared  little  different j  because  of  the  low  logical  complexity  both 
black-  and  white-box  data  provided  reasonable  flowgraph  coverage. 

Black-  and  white-box  test  sets  for  LNPR  also  appeared  somewhat  similar 
since  the  program  processes  identified  in  the  black-box  procedure  were 
fairly  transparently  evident  in  the  program  logic.  The  black-  and 
white-box  test  sets  for  SCNK  appear  vastly  different;  while  the  black 
box  set  represents  a  comprehensively  diverse  "typical"  program,  the 
white -box  set  represents  a  string  of  symbols  designed  to  force  the 
finite  state  automaton  through  its  permissible  states. 


The  errors  made  within  each  of  the  four  experimental  programs  were 
assigned  one  of  three  main  types:  computational,  logical,  or  data 
handling.  Each  of  these  three  categories  were,  in  turn,  decomposed 
into  subclasses  of  error  types  to  which  every  program  error  was 
assigned: 

.  COMPUTATIONAL 

-  Missing  Computation:  One  or  more  statements  necessary 
for  proper  computation  were  forgotten. 

-  Improper  Expression:  The  computation  of  a  value  did  not 
match  the  specifications. 

-  Machine  Limitation:  An  expression,  though  ostensibly 
matching  the  specifications,  did  not  take  into  account 
some  machine  or  language  limitation  which  invalidated  it. 

.  LOGICAL 

-  Branch/Sequence :  Statements  which  should  follow  one 
another  during  execution  did  not  as  a  result  of  improper 
branching  destination  or  statement  permutation. 

-  Wrong  Boolean  Expression:  An  IF  condition  was  improperly 
expressed. 

-  Missing  Logic:  A  situation  which  required  special  handling 
was  overlooked. 

-  Redundancy :  A  statement  or  statement  group  was  improperly 
repeated. 

.  LATA  HANDLING 

-  Improper  Initialization:  A  variable  was  initialized  im¬ 
properly  through  assignment  or  REAL /DATA  statements. 

-  Wrong  Variable  Name :  The  obvious  use  of  the  wrong  vari¬ 
able  name  was  performed. 

-  Subscrlpt/Substrlng:  An  improper  array  index  value  or  sub¬ 
string  bound  was  employed. 


The  description  of  errors  and  designation  into  error  types  is  given 
in  Tables  E.l  through  E.4.  The  breakdown  of  error  type  counts  by 

i 

experimental  program  is  shown  in  Table  E.5.  Since  computational 
errors  are  found  in  expressions  and  logical  errors  more  prevalent  in 
complex  code,  it  is  not  surprising  that  OPTM  (high  computational  con¬ 
tent,  low  logical  complexity)  and  SCNR  (low  computational  content, 
ligh  logical  complexity)  exhibit  complementary  error  patterns,  while 


Table  E.l 


Error  Description:  OPTM 


Error 

Humber 

Statement 

Numbers 

Error 

Description 

Error 

Types 

Error 

Subclass 

1 

80 

B3  e.b.  -B3 

Ccrnp 

Imp  Exp 

2 

110 

80/2  s.b.  SO  3/4 

Camp 

Imp  Exp 

3 

90 

3**C1  s.b.  3*Cl 

Comp 

Imp  Exp 

4 

200 

b*  s.b.  b)* 

Camp 

Imp  Exp 

6 

280/290 

s.b.  permuted 

Logic 

Wng  Br/Seq 

7 

150/140 

s.b.  ABS(X)  -333*SGN(X) 

Canp 

Mach  Lim 

8 

470 

s.b.  GOTO  10 

Logic 

Wng  Br/Seq 

Mote.  For  this  table  and  the  three  tables  to  follov;  the 
following  abbreviations  are  being  used: 


Comp 

C  amputat i onal 

Wng  B  Exp 

Wrong  Branch/ 

Wng  Op 

Wrong  Operand 

Expression 

Msg  Cmp 

Missing  Computation 

Msg  Lgc 

Missing  Logic 

Imp  Exp 

Improper  Expression 

Bdndnt 

Redundant 

Mach  Lim 

Machine  Limitation 

D/E 

Data  Handling 

Logic 

Logical 

Imp  Init 

Improper  Initialization 

Wng  Br/Seq 

Wrong  Branch/ 

VBWN 

Variable  by  Wrong  Name 

Sequencing 

Sub/Sub 

Subscript/Substring 

s.b. 


should  be 


Table  E-3 

Error  Description:  ITAX 


Error 

Number 

Statement 

Numbers 

Error 

Description 

Error 

Types 

Error 

Subclass 

1.1 

385 

Missing  S0  =  S0  +  Y2 

Ccmp 

Msg  Cmp 

1.2 

385 

Missing  Si  =  SI  +  Y2 

Comp 

Msg  Cmp 

2 

4oo 

SI  s.b.  S0 

D/H 

VBWN 

3 

600 

>  s.b.  < 

Logic 

Wng  B  Exp 

4 

1260/  ^ 
1300 

s.b.  removed 

Comp 

Imp  Exp 

5/6 

umv^JU i 

Tax  computed 

Ccmp 

Imp  Exp 

1660/ 

1680 

Incorrectly 

7 

1620 

Redundant 

D/H 

Imp  Init 

8 

680 

?  s.b.  1 

Logic 

Wng  B  Exp 

9 

1240 

01  s.b.  00 

D/H 

VBWN 

Table  E.4 

Error  Description:  SC NR 


263 


Error  Statement 
Number  Number(s) 

Error 

Type 

Error 

Class 

1 

130-134 

Identifier  length  test 

Logic 

Msg  Lgc 

2.1 

240-244 

Excess  code  pairs  test 

Logic 

Msg  Lgc 

2.2 

635-37 

Excess  code  pairs  test 

Logic 

Msg  Lgc 

2-3 

1490-1494 

Excess  code  pairs  test 

Logic 

Msg  Lgc 

2.4 

1530-34 

Excess  code  pairs  test 

Logic 

Msg  Lgc 

3 

430 

Forgot  Q?*l 

D/H 

Imp  Init 

6IO-3O 

Forgot  Q7*0 

D/H 

Imp  Init 

5 

960 

+P7))  s.b.  -P7  +  1)) 

Comp 

Imp  Init 

6.1 

1425 

First  identifier  check 

Logic 

Msg  Lgc 

6.2 

1480 

First  identifier  check 

Logic 

Msg  Lgc 

7 

1520 

Return  s.b.  Goto  1790 

Logic 

Wng  Br/Seq. 

8 

1539-1540 

C(X  s.b.  C(C9 

D/E 

Sub/ Sub 

9 

1750 

2$  s.b.  C$ 

D/H 

TOWN 

10 

1970 

Forgot  step-1 

Camp 

Msg  Cmp 

11 

2490 

Forgot  X9-0 

D/H 

Imp  Init 

12 

2750 

)•"  s.b.  )'" 

D/H 

Imp  Init 

13 

2780 

53,  54,  55  s.b.  52,  53,  54 

D/H 

Imp  Init 

14 

3120 

EL  ♦  5  s-b.  PI  +  7 

Camp 

Imp  Exp 

15 

557 

Missing  09*29-1 

Comp 

Msg  Cmp 

16 

1735-1737 

End  of  symbol  test 

Logic 

Msg  Lgc 

17 

1905 

Forgot  P1«T8 

Comp 

Msg  Cmp 

18 

530 

1  to  100  s.b.  1  to  255 

D/H 

Imp  Init 

IS 

2530 

36  s.b.  35 

D/H 

Imp  Init 

20 

2590 

19  s.b.  18 

D/H 

Imp  Init 

21 

2640-2650 

18/21,22/23  s.b.  18/20,21/22 

D/E 

Imp  Init 

22 

2250 

C.13  s.b.  C-14 

D/H 

Imp  Init 

23 

2020 

Return 

Logic 

Rdndnt 

24 

1488 

C9-C9+1  unnecessary 

Logic 

Rdndnt 

25 

248 

C9-C9+1  unnecessary 

Logic 

Rdndnt 

26 

1415 

17*1 

D/S 

Imp  Init 

27 

1427 

s.b.  deleted 

Logic 

Wnc  Br/Sec 

Table  E.5 


Breakdown  of  Error  Type  by  Program 
ITAX  LNPR 

Computational :  Total  4  5 

Missing  Comp  2  3 

Imp  Exp  2  2 

Machine  Limitation  0  0 

Logical:  Total  2  7 

Branch ing/Sequencing  0  4 

Wrong  Booleon  Expression  2  3 

Missing  Logic  0  0 

Bedundant  Code  0  0 

Data  Handling:  Total  3  4 

Improper  Initialization  1  2 

Variable  by  Wrong  Name  2  0 

Subscript/Substring  0  2 


265 


ITAX  and  LNPR  have  more  balanced  error  counts.  The  propensity  to 
commit  data  handling  errors  appears  slightly  stronger  in  programs  of 
high  logical  complexity. 


APPENDIX  F 


CONTENTS  OF  THE  EXPERIMENTAL  PACKETS 


PARTICIPANT 
ACCOUNT  NUMBER, 
;  PASSWORD 


Dear  Participant:  | 

I  vish  to  thank  you  for  volunteering  to  participate  in  this  pro¬ 
gramming  experiment.  Your  help  in  this  research  is  not  only  invaluable 
to  me  in  my  pursuit  of  a  PhD,  but  more  importantly,  may  help  us,  the 
computing  community,  to  better  understand  the  factors  affecting  soft¬ 
ware  development. 

The  purpose  of  this  experiment  is  to  collect  data  concerning  the 
behavior  of  individuals  engaged  in  program  debugging.  You  will  find, 
enclosed  in  this  packet,  specifications  for  four  (k)  programs.  These 
programs  have  been  carefully  chosen  and  written  to  represent  many  types 
of  software  that  is  developed  today.  They  include 

•  SC NR  -  a  program  which  reads  in  statements  in  a  fictitious 

programming  language,  and  converts  each  distinct 
language  symbol  into  a  code. 

•  OPTM  -  a  program  which  solves  for  the  roots  of  a  fourth 

degree  polynomial. 

.  LNPR  -  a  linear  programming  program  which  employs  the 
primal  and  dual  simplex  algorithms  to  solve 
linear  optimization  problems. 

•  ITAX  -  a  program  which  computes  the  income  tax  for  an 

individual.  ; 

I 

I 

It  is  expected  that  you  will  not  be  familiar  with  all  of  these  applies-  j 
tions.  In  fact,  you  may  not  be  acquainted  with  any  of  these  problems, 
but  your  efforts  are  just  as  important  as  those  participants  who  have  , 
had  greater  experience  with  these  applications.  ! 

All  of  the  experimental  programs  (SH-JR,  OPTM,  LNPR,  ITAX)  have 
{errors  in  them.  These  errors  o;  "bugs"  have  not  been  artifically 
Inserted;  they  occurred  during  the  normal  course  of  program  develop- 
[ment.  IT  IS  YOUR  TAtK  TO  FIND  THEM.  The  number  and  types  of  bugs  1 


26? 


268; 

that  you  find,  as  veil  as  the  activities  vhich  lead  to  their  discovery, 

is  the  primary  information  that  you  viU  provide  for  this  research.  ! 

, 

We  realize  that  everyone  has  his/her  ovn  method  of  program  testing  i 
and  debugging.  However,  because  the  data  collected  must  be  comparable  i 
from  all  participants,  ve  ask  that  your  debugging  activities  be  con¬ 
ducted  in  a  pre-set  way.  During  the  orientation  you  will  be  told  the  \ 
order  in  vhich  you  should  select  programs  for  debugging,  and  the  amount 
of  time  that  you  have  for  each  of  the  four  programs.  It  is  important 
that  you 

*  debug  the  four  programs  one  at  a  time, 

•  in  the  order  that  you  are  instructed  to, 

.  for  exactly  the  amount  of  time  allotted  to  you  - 
no  more,  no  less. 

During  the  course  of  the  experiment  you  will  always  be  engaged  in 
one  of  five  (5)  activities: 

(1)  Test  Data  Development:  for  those  of  you  who  have  not  been 
given  data  to  use,  you  must  take  time  out  to  construct  data 
to  test  an  experimental  program;  when  you  begin  and  when  you 
end  this  activity,  write  the  time  on  your  ACTIVITY  LOG,  in¬ 
cluded  in  this  packet. 

(2)  Terminal  Work:  every  time  you  want  to  run  the  latest  version 

of  an  experimental  program,  you  must  log  on  one  of  the  desig¬ 
nated  terminals,  make  whatever  changes  to  the  program  or  data 
that  you  have  decided  upon  and  run  the  program;  the  designated 
terminals  will  all  be  hardcopy  devices,  so  please  write  your 
name,  participant  number,  and  the  time  on  the  terminal  paper 
before  logging  on;  when  you  begin  and  when  you  end  this  activ-  j 
ity,  write  the  time  of  your  ACTIVITY  LOG,  included  in  this 
packet .  1 

(3)  Code  Revlev/Error  Detection:  after  leaving  the  terminal,  pick 

!  up  whatever  listings  you  have  asked  for  from  the  lineprinter  1 

i  and  proceed  to  debug  the  program,  by  considering  the  listing  ! 

and  results  of  the  program  run;  when  you  begin  and  when  you  | 

|  end  this  activity,  write  the  time  on  your  ACTIVITY  LOG,  ' 

included  in  this  packet. 

j  (4)  Error  Correction:  once  you  think  that  you  have  found  an  error, 
terminate  the  Code  Reviev/Error  Detection  activity,  and  begin 


considering  what  program  changes  are  necessary  to  correct  the 
error  you  have  found.  Log  all  intended  changes  on  the  PROGRAM 
MODIFICATION  LOG  included  in  this  packet.  You  may  switch  from 
"looking  for  errors"  to  correcting  them  many  times  in  one  sit- 


The  experiment  is  to  be  conducted  Sunday,  January  27,  in  Bridge 
Hall  208,  on  the  USC  campus.  Directions  are  included  in  this  packet. 
Orientation  will  begin  promptly  at  8:00  A.M.  Your  early  arrival  will 
be  enthusiastically  appreciated.  We  expect  the  experiment  to  last 
approximately  twelve  hours  with  breaks  for  resting  and  meals.  Thank 
you  again  for  your  help. 


270 


EXPEEIMENTA 


8:00  A.M.  -  9:00  A.M. 

9:00  A.M.  -  12:30  P.M. 

12:30  P.M.  -  1:00  P.M. 

1:00  P.M.  -  4:30  P.M. 

4:30  P.M.  -  5:00  P.M. 

5:00  P.M.  -  8:00  P.M. 

8:00  P.M.  -  8:30  P.M. 


TIMETABLE 

Experimental  orientation, 
distribution  of  materials 

Experimental  Session  1 

Lunch  break 

Experimental  Session  II 

Dinner  break 

Experimental  Session  III 

Collection  of  experimental 
materials 


271 


TERMINAL  RULES 


(l)  You  can  only  LIST,  RUN,  GET,  or  EDIT  the  last  version  of  the 
program  that  has  been  assigned  to  you. 


(2)  You  are  not  permitted  to  PURGE  any  program  or  file. 

(3)  You  are  not  permitted  to  make  any  program  changes  that  were  not 
listed  on  your  program  modification  log  when  you  sat  down  at 
the  terminal. 


( U )  You  must  rename  any  modified  program  to  the  next  higher  version 
(e.g.,  LNPROh  becomes  LKPR05),  and  SAVE  before  leaving  the 
terminal. 


Terminal  Work 


2' 


PROGRAM  MODIFICATION  LOG 


Subject: 

Date: _ 

Program:, 


VERSION 

ISXETE 

INSERT 

MODIFY 

COMMENTS 

TIME 

BEING  MODIFIED 

UNCLASSIFIED 


REPORT  DOCUMENTATION  PAGE 


REAI  INSTRUCTION; 
BEFORE  COMPLETIN' C  FORF 


A  T  E  m~  :  St-b»f  *lf 

A  Study  of  the  Factors  Affecting  Software 
Testing  Performance  and  Computer  Program 
Reliability  Growth 


s  type  refok’  6  per  o:  cdvEre: 

Technical 

e  RERFORminG  ORG  REPCO' 


PERFORM  ORGANIZATION  name  and  address 

Department  of  Management  and  Policy  Sciences 

University  of  Southern  California 

Los  Angeles,  Cal ifornia  90007  _ 


*•  COn' PC „  JNG  OFFICE  NAME  AND  ADDRESS 

Office  of  Naval  Research,  Code  434 
Arlington,  Virginia  22217 


I  1C  BROGBAN  E.EmEk'  >‘6;jE:‘  TaS» 
!  ARE*  *  BOR*  UN"  NUMBER' 


NR  042-323 


I J  NUMBER  O'  f  AGES 

278 


AGEnCv  NAME  ft  ADDRESSFI/  diflorent  Irotr  Controlling  Office.  | 

15  SECuRlTy  CLASS  (of  thu  report 

Unclassified 

IS*  DECl  ASSiRiCATION  DOWNGRADING 

schedule 

t«  D  S’B'E.’iOk  ST*fEMENT  lot  Uiii  Report 


Approved  for  public  release;  distribution  unlimited. 


»?  D»ST  R:  By*  'ON  STATEMENT  <ol  fh*  mbetrmct  entered  In  Block  20.  If  dlffftran  troer  Report; 


T9  *|v  WORDS  f  Continue  on  revere*  old*  If  necoeemry  and  Idtnflh  b>  Hoc*  number; 


Software  reliability,  software  testing  procedures,  testing  theory, 
reliability  growth,  software  reliability  models,  estimation  of  error 
content. 


2(  aB$traCt  4  Confnu#  on  revetee  o Ido  If  no coo»«ry  and  Idondfy  by  block  n%m\ber) 

An  exploratory  study  was  conducted  to  determine  what  personal,  environmen¬ 
tal  and  program-related  factors  affect  the  process  of  discovering  errors  in 
computer  software.  Two  prevailing  philosophies  concerning  program  debugging 
are  the  constructive  and  descriptive  approaches.  The  constructive  view  of 
program  debugging  Is  that  characteristics  of  the  software,  the  programmer, 
and  the  environment  in  which  they  interact  largely  determine  the  speed  and 
thoroughness  with  which  program  errors  will  be  eliminated.  The  descriptive 


1473  E  D '  * '  D  N  OF  1  NOv  •!  IS 

S  K  0105  L  *  Old  660 


HCul  TV  C.All  F«C  A*  to*  OF  TniI  PAGE  Wv 


UNCLASSIFIED _ 

jf  •  t  * »  ;.*$*■  r  ‘  »'  ?  r  *  *  «  !  s  *  ;•  •  ."•••  F*  #•*: 

view  is  that  the  distribution  of  error  discoveries  conforms  to  probabilistic 
models  of  reliability  growth. 

An  experiment  was  conducted  in  which  twenty-two  subjects  were  given  a  fixed 
amount  of  time  to  find  and  correct  errors  in  each  of  four  programs.  The  pro¬ 
grams  were  designed  so  that  each  represented  one  of  four  combinations  of  two 
levels  each  of  logical  complexity  and  conputational  content.  These  programs 
were  thoroughly  debugged  by  the  author  and  a  classification  of  the  types  and 
frequencies  of  errors  was  recorded.  All  errors  not  related  to  program  design 
were  then  reinserted. 

For  each  experimental  program,  a  set  of  specification-based  "black-box"  data 
and  a  set  of  program  structure-based  "white-box"  data  were  developed.  Each 
subject  was  instructed  to  use  four  of  these  eight  input  data  sets  in  confor¬ 
mance  with  the  experimental  design. 

During  the  course  of  the  experiment,  each  subject  partially  debugged  the 
four  experimental  programs  in  preassigned  order.  Subjects  recorded  the  dur¬ 
ation  of  each  debugging  activity  and  the  times  at  which  program  errors  were 
calculated,  measured  by  the  number  of  errors  found,  as  well  as  the  distri¬ 
bution  and  orderings  of  error  discovery  times. 

The  subject  group  exhibited  a  wide  range  of  ages,  experience  and  education. 
Analyses  indicated  that  a  reasonable  amount  of  recent  programming  experience 
was  more  important  than  any  other  determinant  of  debugging  performance. 

An  analysis  of  the  errors  discovered  by  the  subject  group  indicated  that 
logical  and  data-handl ing  errors  were  less  frequently  found  than  computational 
errors.  Moreover,  the  visibility  with  which  a  program  error  disagreed  with 
the  program  specifications  was  directly  related  to  its  errors  discovery  fre¬ 
quency.  The  wide  range  of  error  discovery  frequencies  confirms  that  some 
errors  were  inherently  more  difficult  than  others.  This  conjecture  was  also 
supported  by  a  Chi-square  analysis  of  discovery  frequencies,  and  the  indepen¬ 
dence  between  error  and  difficulty  and  subjects  was  confirmed  by  a  test  of 
rank  concordance  among  subjects. 

An  analysis  of  variance  using  logical  complexity  level,  computational  content 
and  test  data  type  as  factors  showed  only  the  first  two  factors  as  having  a 
significant  effect  on  error  discovery.  A  significantly  negative  interaction 
effect  of  these  factors  pointed  out  the  inadequacy  of  the  program-character¬ 
istic  metrics  employed. 

Maximum  likelihood  estimates  of  subjects'  proficiencies  and  errors  inherent 
arrival  rates  were  calculated  to  test  whether  or  not  discovery  times  were 
exponentially  distributed.  Discovery  times  as  a  set  exhibited  a  significantly 
decreasing  discovery  rate  over  time,  in  contradiction  to  the  assumptions  of 
the  Hel inski-Moranda  and  Schick-Wol verton  software  reliability  models.  A 
distributional  model  was  developed  and  tested  in  which  error  discovery  times 
are  modeled  as  order  statistics  on  an  underlying  Pareto  distribution.  While 
all  three  models  overestimated  the  actual  number  of  errors  In  each  experimental 
program,  only  the  proposed  model  can  readily  accommodate  the  assumption  of 
non-constant  discovery  rates  for  individual  errors. 


•* 


t.  l  •••  lr 


